{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kaa2BdMZz9Ua"
   },
   "source": [
    "# Distilling knowlege in Transformer models and test prediction for GLUE tasks, using *torchdistill*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "v9cTkGsg0I6K"
   },
   "source": [
    "## 1. Make sure you have access to GPU/TPU\n",
    "Google Colab: Runtime -> Change runtime type -> Hardware accelarator: \"GPU\" or \"TPU\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-7U6G5N8z06c",
    "outputId": "e7dbb8c6-181a-4f9a-e33c-a2310ded889b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Thu Jun  3 03:02:38 2021       \n",
      "+-----------------------------------------------------------------------------+\n",
      "| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |\n",
      "|-------------------------------+----------------------+----------------------+\n",
      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
      "|                               |                      |               MIG M. |\n",
      "|===============================+======================+======================|\n",
      "|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |\n",
      "| N/A   45C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "                                                                               \n",
      "+-----------------------------------------------------------------------------+\n",
      "| Processes:                                                                  |\n",
      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
      "|        ID   ID                                                   Usage      |\n",
      "|=============================================================================|\n",
      "|  No running processes found                                                 |\n",
      "+-----------------------------------------------------------------------------+\n"
     ]
    }
   ],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WtaTzdTy0mMg"
   },
   "source": [
    "## 2. Clone torchdistill repository to use its example code and configuration files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "wR5GGkREVl3s",
    "outputId": "ada8b1fd-d6d4-4983-fde9-a15cadd69643"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloning into 'torchdistill'...\n",
      "remote: Enumerating objects: 5231, done.\u001b[K\n",
      "remote: Counting objects: 100% (1013/1013), done.\u001b[K\n",
      "remote: Compressing objects: 100% (374/374), done.\u001b[K\n",
      "remote: Total 5231 (delta 575), reused 982 (delta 561), pack-reused 4218\u001b[K\n",
      "Receiving objects: 100% (5231/5231), 1.24 MiB | 20.17 MiB/s, done.\n",
      "Resolving deltas: 100% (3189/3189), done.\n"
     ]
    }
   ],
   "source": [
    "!git clone https://github.com/yoshitomo-matsubara/torchdistill"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZgzJAnV00UN8"
   },
   "source": [
    "## 3. Install dependencies and *torchdistill*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Gz9Y_IpzevFw",
    "outputId": "dc584431-5e06-4ffd-af15-b2f047472d9e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting accelerate\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/f7/fa/d173d923c953d930702066894abf128a7e5258c6f64cf088d2c5a83f46a3/accelerate-0.3.0-py3-none-any.whl (49kB)\n",
      "\r",
      "\u001b[K     |██████▋                         | 10kB 20.9MB/s eta 0:00:01\r",
      "\u001b[K     |█████████████▏                  | 20kB 26.5MB/s eta 0:00:01\r",
      "\u001b[K     |███████████████████▊            | 30kB 31.1MB/s eta 0:00:01\r",
      "\u001b[K     |██████████████████████████▎     | 40kB 35.0MB/s eta 0:00:01\r",
      "\u001b[K     |████████████████████████████████| 51kB 7.5MB/s \n",
      "\u001b[?25hCollecting datasets>=1.1.3\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/94/f8/ff7cd6e3b400b33dcbbfd31c6c1481678a2b2f669f521ad20053009a9aa3/datasets-1.7.0-py3-none-any.whl (234kB)\n",
      "\u001b[K     |████████████████████████████████| 235kB 34.9MB/s \n",
      "\u001b[?25hCollecting sentencepiece!=0.1.92\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)\n",
      "\u001b[K     |████████████████████████████████| 1.2MB 40.5MB/s \n",
      "\u001b[?25hRequirement already satisfied: protobuf in /usr/local/lib/python3.7/dist-packages (from -r torchdistill/examples/hf_transformers/requirements.txt (line 4)) (3.12.4)\n",
      "Requirement already satisfied: torch>=1.8.1 in /usr/local/lib/python3.7/dist-packages (from -r torchdistill/examples/hf_transformers/requirements.txt (line 5)) (1.8.1+cu101)\n",
      "Collecting transformers>=4.6.1\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB)\n",
      "\u001b[K     |████████████████████████████████| 2.3MB 38.7MB/s \n",
      "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from -r torchdistill/examples/hf_transformers/requirements.txt (line 7)) (1.1.5)\n",
      "Collecting pyaml>=20.4.0\n",
      "  Downloading https://files.pythonhosted.org/packages/15/c4/1310a054d33abc318426a956e7d6df0df76a6ddfa9c66f6310274fb75d42/pyaml-20.4.0-py2.py3-none-any.whl\n",
      "Requirement already satisfied: tqdm<4.50.0,>=4.27 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (4.41.1)\n",
      "Collecting huggingface-hub<0.1.0\n",
      "  Downloading https://files.pythonhosted.org/packages/32/a1/7c5261396da23ec364e296a4fb8a1cd6a5a2ff457215c6447038f18c0309/huggingface_hub-0.0.9-py3-none-any.whl\n",
      "Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (0.3.3)\n",
      "Requirement already satisfied: pyarrow<4.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (3.0.0)\n",
      "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (1.19.5)\n",
      "Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (0.70.11.1)\n",
      "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (20.9)\n",
      "Collecting fsspec\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/bc/52/816d1a3a599176057bf29dfacb1f8fadb61d35fbd96cb1bab4aaa7df83c0/fsspec-2021.5.0-py3-none-any.whl (111kB)\n",
      "\u001b[K     |████████████████████████████████| 112kB 56.7MB/s \n",
      "\u001b[?25hRequirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2.23.0)\n",
      "Collecting xxhash\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/7d/4f/0a862cad26aa2ed7a7cd87178cbbfa824fc1383e472d63596a0d018374e7/xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243kB)\n",
      "\u001b[K     |████████████████████████████████| 245kB 54.4MB/s \n",
      "\u001b[?25hRequirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (4.0.1)\n",
      "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from protobuf->-r torchdistill/examples/hf_transformers/requirements.txt (line 4)) (57.0.0)\n",
      "Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.7/dist-packages (from protobuf->-r torchdistill/examples/hf_transformers/requirements.txt (line 4)) (1.15.0)\n",
      "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.8.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 5)) (3.7.4.3)\n",
      "Collecting tokenizers<0.11,>=0.10.1\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)\n",
      "\u001b[K     |████████████████████████████████| 3.3MB 46.9MB/s \n",
      "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (3.0.12)\n",
      "Collecting sacremoses\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
      "\u001b[K     |████████████████████████████████| 901kB 45.5MB/s \n",
      "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (2019.12.20)\n",
      "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->-r torchdistill/examples/hf_transformers/requirements.txt (line 7)) (2018.9)\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->-r torchdistill/examples/hf_transformers/requirements.txt (line 7)) (2.8.1)\n",
      "Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from pyaml>=20.4.0->accelerate->-r torchdistill/examples/hf_transformers/requirements.txt (line 1)) (3.13)\n",
      "Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2.4.7)\n",
      "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (3.0.4)\n",
      "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (1.24.3)\n",
      "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2.10)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2020.12.5)\n",
      "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (3.4.1)\n",
      "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (1.0.1)\n",
      "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (7.1.2)\n",
      "\u001b[31mERROR: transformers 4.6.1 has requirement huggingface-hub==0.0.8, but you'll have huggingface-hub 0.0.9 which is incompatible.\u001b[0m\n",
      "Installing collected packages: pyaml, accelerate, huggingface-hub, fsspec, xxhash, datasets, sentencepiece, tokenizers, sacremoses, transformers\n",
      "Successfully installed accelerate-0.3.0 datasets-1.7.0 fsspec-2021.5.0 huggingface-hub-0.0.9 pyaml-20.4.0 sacremoses-0.0.45 sentencepiece-0.1.95 tokenizers-0.10.3 transformers-4.6.1 xxhash-2.0.2\n",
      "Collecting torchdistill\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/70/c1/7e28cde90e7eaa7ec424a495be7021a537e4d048d7341e2186427d14ca0f/torchdistill-0.2.1-py3-none-any.whl (78kB)\n",
      "\u001b[K     |████████████████████████████████| 81kB 9.0MB/s \n",
      "\u001b[?25hCollecting pyyaml>=5.4.1\n",
      "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)\n",
      "\u001b[K     |████████████████████████████████| 645kB 34.4MB/s \n",
      "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchdistill) (1.19.5)\n",
      "Requirement already satisfied: pycocotools>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from torchdistill) (2.0.2)\n",
      "Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from torchdistill) (0.29.23)\n",
      "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from torchdistill) (1.4.1)\n",
      "Requirement already satisfied: torchvision>=0.8.2 in /usr/local/lib/python3.7/dist-packages (from torchdistill) (0.9.1+cu101)\n",
      "Requirement already satisfied: torch>=1.7.1 in /usr/local/lib/python3.7/dist-packages (from torchdistill) (1.8.1+cu101)\n",
      "Requirement already satisfied: matplotlib>=2.1.0 in /usr/local/lib/python3.7/dist-packages (from pycocotools>=2.0.1->torchdistill) (3.2.2)\n",
      "Requirement already satisfied: setuptools>=18.0 in /usr/local/lib/python3.7/dist-packages (from pycocotools>=2.0.1->torchdistill) (57.0.0)\n",
      "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.8.2->torchdistill) (7.1.2)\n",
      "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.7.1->torchdistill) (3.7.4.3)\n",
      "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (2.4.7)\n",
      "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (2.8.1)\n",
      "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (0.10.0)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (1.3.1)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (1.15.0)\n",
      "Installing collected packages: pyyaml, torchdistill\n",
      "  Found existing installation: PyYAML 3.13\n",
      "    Uninstalling PyYAML-3.13:\n",
      "      Successfully uninstalled PyYAML-3.13\n",
      "Successfully installed pyyaml-5.4.1 torchdistill-0.2.1\n"
     ]
    }
   ],
   "source": [
    "!pip install -r torchdistill/examples/hf_transformers/requirements.txt\n",
    "!pip install torchdistill"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GgvtpJmHSGXr"
   },
   "source": [
    "## (Optional) Configure Accelerate for 2x-speedup training by mixed-precision\n",
    "\n",
    "If you are **NOT** using the Google Colab Pro, it will exceed 12 hours (maximum lifetimes for free Google Colab users) to fine-tune a base-sized model for the following 9 different tasks with Tesla K80.\n",
    "By using mixed-precision training, you can complete all the 9 fine-tuning jobs.\n",
    "[This table](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#mixed-precision-training) gives you a good idea about how long it will take to fine-tune a BERT-Base on a Titan RTX with/without mixed-precision."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "MGI5L9W6SEfT",
    "outputId": "c502f66d-4e09-4b22-e16e-01c17728f4c3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0\n",
      "Which type of machine are you using? ([0] No distributed training, [1] multi-GPU, [2] TPU): 0\n",
      "How many processes in total will you use? [1]: 1\n",
      "Do you wish to use FP16 (mixed precision)? [yes/NO]: yes\n"
     ]
    }
   ],
   "source": [
    "!accelerate config"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RxppgvR51Ij1"
   },
   "source": [
    "## 4. Distill knowledge in Transformer models for GLUE tasks\n",
    "The following examples demonstrate how to distill knowledge in fine-tuned BERT-Large (uncased) to pretrained BERT-Base (uncased) on each of datasets in GLUE.  \n",
    "**Note**: Test splits for GLUE tasks in `datasets` package are not labeled, and you use only training and validation spltis in this example, following [Hugging Face's example](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bFHCWbIG1paE"
   },
   "source": [
    "### 4.1 CoLA task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4oFHTV4jV7yN",
    "outputId": "ddbcef4b-aff5-46fd-ad4a-0e1e77c9504c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-02 03:03:01.360176: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/02 03:03:03\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)\n",
      "2021/06/02 03:03:03\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/02 03:03:03\tINFO\tfilelock\tLock 139654721540560 acquired on /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpxp_rqbyd\n",
      "Downloading: 100% 699/699 [00:00<00:00, 676kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278\n",
      "creating metadata file for /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278\n",
      "2021/06/02 03:03:04\tINFO\tfilelock\tLock 139654721540560 released on /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"cola\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"cola\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 03:03:05\tINFO\tfilelock\tLock 139654715649680 acquired on /root/.cache/huggingface/transformers/4d276766e35e40fa94099ec43d7652c20ced1ff0dc47e4211ed153b59cbe8dc5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp4550x91r\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.75MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/4d276766e35e40fa94099ec43d7652c20ced1ff0dc47e4211ed153b59cbe8dc5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/4d276766e35e40fa94099ec43d7652c20ced1ff0dc47e4211ed153b59cbe8dc5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 03:03:05\tINFO\tfilelock\tLock 139654715649680 released on /root/.cache/huggingface/transformers/4d276766e35e40fa94099ec43d7652c20ced1ff0dc47e4211ed153b59cbe8dc5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 03:03:05\tINFO\tfilelock\tLock 139654715649680 acquired on /root/.cache/huggingface/transformers/baa22d3f2d1c8276ab91ed135a03677ca16578742fa64673f99f1f76a0bcf039.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp40rl31ye\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.31MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/baa22d3f2d1c8276ab91ed135a03677ca16578742fa64673f99f1f76a0bcf039.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "creating metadata file for /root/.cache/huggingface/transformers/baa22d3f2d1c8276ab91ed135a03677ca16578742fa64673f99f1f76a0bcf039.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "2021/06/02 03:03:06\tINFO\tfilelock\tLock 139654715649680 released on /root/.cache/huggingface/transformers/baa22d3f2d1c8276ab91ed135a03677ca16578742fa64673f99f1f76a0bcf039.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "2021/06/02 03:03:06\tINFO\tfilelock\tLock 139654715629968 acquired on /root/.cache/huggingface/transformers/55fda2100c59f2532badf55f1e55ddd04f1e42f9fba213a24188fca54fe722c5.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpq79y1h1b\n",
      "Downloading: 100% 112/112 [00:00<00:00, 109kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/55fda2100c59f2532badf55f1e55ddd04f1e42f9fba213a24188fca54fe722c5.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/55fda2100c59f2532badf55f1e55ddd04f1e42f9fba213a24188fca54fe722c5.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/02 03:03:07\tINFO\tfilelock\tLock 139654715629968 released on /root/.cache/huggingface/transformers/55fda2100c59f2532badf55f1e55ddd04f1e42f9fba213a24188fca54fe722c5.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/02 03:03:07\tINFO\tfilelock\tLock 139654715629968 acquired on /root/.cache/huggingface/transformers/2ee3386c96fee3e77e6a9bb26d0c25ec7a54c0f8cece147d2b1fc25d77bcdf5a.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpyxqgyz21\n",
      "Downloading: 100% 304/304 [00:00<00:00, 288kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/2ee3386c96fee3e77e6a9bb26d0c25ec7a54c0f8cece147d2b1fc25d77bcdf5a.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/2ee3386c96fee3e77e6a9bb26d0c25ec7a54c0f8cece147d2b1fc25d77bcdf5a.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:03:07\tINFO\tfilelock\tLock 139654715629968 released on /root/.cache/huggingface/transformers/2ee3386c96fee3e77e6a9bb26d0c25ec7a54c0f8cece147d2b1fc25d77bcdf5a.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/4d276766e35e40fa94099ec43d7652c20ced1ff0dc47e4211ed153b59cbe8dc5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/baa22d3f2d1c8276ab91ed135a03677ca16578742fa64673f99f1f76a0bcf039.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/55fda2100c59f2532badf55f1e55ddd04f1e42f9fba213a24188fca54fe722c5.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/2ee3386c96fee3e77e6a9bb26d0c25ec7a54c0f8cece147d2b1fc25d77bcdf5a.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:03:08\tINFO\tfilelock\tLock 139654683257552 acquired on /root/.cache/huggingface/transformers/6d316d39f823337b9d078b49832ec886bb9c595c16dbe74f3e85eaa710d41e6f.a2155877d52bb33708b54943ba6cc4b59e3dd78f2ff9ddc36df3bf8c731cd34e.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmplz94egr7\n",
      "Downloading: 100% 1.34G/1.34G [00:40<00:00, 33.2MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/6d316d39f823337b9d078b49832ec886bb9c595c16dbe74f3e85eaa710d41e6f.a2155877d52bb33708b54943ba6cc4b59e3dd78f2ff9ddc36df3bf8c731cd34e\n",
      "creating metadata file for /root/.cache/huggingface/transformers/6d316d39f823337b9d078b49832ec886bb9c595c16dbe74f3e85eaa710d41e6f.a2155877d52bb33708b54943ba6cc4b59e3dd78f2ff9ddc36df3bf8c731cd34e\n",
      "2021/06/02 03:03:49\tINFO\tfilelock\tLock 139654683257552 released on /root/.cache/huggingface/transformers/6d316d39f823337b9d078b49832ec886bb9c595c16dbe74f3e85eaa710d41e6f.a2155877d52bb33708b54943ba6cc4b59e3dd78f2ff9ddc36df3bf8c731cd34e.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/6d316d39f823337b9d078b49832ec886bb9c595c16dbe74f3e85eaa710d41e6f.a2155877d52bb33708b54943ba6cc4b59e3dd78f2ff9ddc36df3bf8c731cd34e\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-cola.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 03:03:53\tINFO\tfilelock\tLock 139654721513232 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprombhz6e\n",
      "Downloading: 100% 570/570 [00:00<00:00, 502kB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "creating metadata file for /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "2021/06/02 03:03:53\tINFO\tfilelock\tLock 139654721513232 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"cola\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 03:03:54\tINFO\tfilelock\tLock 139654683324496 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpbc_orvh6\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.71MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 03:03:54\tINFO\tfilelock\tLock 139654683324496 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 03:03:54\tINFO\tfilelock\tLock 139654533412176 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpxooypq6l\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.47MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "creating metadata file for /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "2021/06/02 03:03:55\tINFO\tfilelock\tLock 139654533412176 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
      "2021/06/02 03:03:56\tINFO\tfilelock\tLock 139654533411920 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpwahv46n0\n",
      "Downloading: 100% 28.0/28.0 [00:00<00:00, 24.8kB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "creating metadata file for /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "2021/06/02 03:03:56\tINFO\tfilelock\tLock 139654533411920 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "2021/06/02 03:03:56\tINFO\tfilelock\tLock 139654533888080 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpj1g9vr2x\n",
      "Downloading: 100% 440M/440M [00:26<00:00, 16.4MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "creating metadata file for /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "2021/06/02 03:04:23\tINFO\tfilelock\tLock 139654533888080 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading: 28.8kB [00:00, 23.7MB/s]       \n",
      "Downloading: 28.7kB [00:00, 25.9MB/s]       \n",
      "Downloading and preparing dataset glue/cola (download: 368.14 KiB, generated: 596.73 KiB, post-processed: Unknown size, total: 964.86 KiB) to /root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 377k/377k [00:00<00:00, 3.62MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 9/9 [00:00<00:00, 17.29ba/s]\n",
      "100% 2/2 [00:00<00:00, 41.06ba/s]\n",
      "100% 2/2 [00:00<00:00, 61.21ba/s]\n",
      "Downloading: 5.75kB [00:00, 5.71MB/s]       \n",
      "2021/06/02 03:04:30\tINFO\t__main__\tStart training\n",
      "2021/06/02 03:04:30\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/02 03:04:30\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/02 03:04:30\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/02 03:04:30\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/02 03:04:30\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/02 03:04:30\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/02 03:04:41\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [  0/535]  eta: 0:02:04  lr: 9.993769470404985e-05  sample/s: 18.613203810681846  loss: 0.2642 (0.2642)  time: 0.2330  data: 0.0181  max mem: 1758\n",
      "2021/06/02 03:04:46\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 50/535]  eta: 0:00:53  lr: 9.682242990654206e-05  sample/s: 38.381873794648065  loss: 0.1937 (0.2234)  time: 0.1062  data: 0.0016  max mem: 2761\n",
      "2021/06/02 03:04:52\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [100/535]  eta: 0:00:47  lr: 9.370716510903426e-05  sample/s: 37.50783256837149  loss: 0.2070 (0.2097)  time: 0.1063  data: 0.0016  max mem: 2830\n",
      "2021/06/02 03:04:57\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [150/535]  eta: 0:00:41  lr: 9.059190031152648e-05  sample/s: 38.2010555988178  loss: 0.1565 (0.1973)  time: 0.1066  data: 0.0016  max mem: 2830\n",
      "2021/06/02 03:05:03\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [200/535]  eta: 0:00:36  lr: 8.74766355140187e-05  sample/s: 38.03098293988838  loss: 0.1862 (0.1926)  time: 0.1067  data: 0.0016  max mem: 2830\n",
      "2021/06/02 03:05:08\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [250/535]  eta: 0:00:30  lr: 8.436137071651092e-05  sample/s: 38.36976016064  loss: 0.1520 (0.1900)  time: 0.1072  data: 0.0017  max mem: 2830\n",
      "2021/06/02 03:05:13\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [300/535]  eta: 0:00:25  lr: 8.124610591900313e-05  sample/s: 37.868143724198326  loss: 0.1603 (0.1856)  time: 0.1075  data: 0.0017  max mem: 2912\n",
      "2021/06/02 03:05:19\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [350/535]  eta: 0:00:19  lr: 7.813084112149533e-05  sample/s: 38.10950919618296  loss: 0.1283 (0.1797)  time: 0.1076  data: 0.0017  max mem: 2912\n",
      "2021/06/02 03:05:24\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [400/535]  eta: 0:00:14  lr: 7.501557632398754e-05  sample/s: 37.36473858159693  loss: 0.1094 (0.1740)  time: 0.1066  data: 0.0017  max mem: 2912\n",
      "2021/06/02 03:05:29\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [450/535]  eta: 0:00:09  lr: 7.190031152647976e-05  sample/s: 36.87632511940637  loss: 0.1091 (0.1689)  time: 0.1073  data: 0.0016  max mem: 2912\n",
      "2021/06/02 03:05:35\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [500/535]  eta: 0:00:03  lr: 6.878504672897197e-05  sample/s: 34.189815531843784  loss: 0.1447 (0.1654)  time: 0.1071  data: 0.0016  max mem: 2912\n",
      "2021/06/02 03:05:38\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:57\n",
      "2021/06/02 03:05:39\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
      "2021/06/02 03:05:39\tINFO\t__main__\tValidation: matthews_correlation = 0.5079531963854501\n",
      "2021/06/02 03:05:39\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:05:40\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [  0/535]  eta: 0:01:05  lr: 6.660436137071651e-05  sample/s: 33.428073880730835  loss: 0.0891 (0.0891)  time: 0.1223  data: 0.0026  max mem: 2912\n",
      "2021/06/02 03:05:46\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 50/535]  eta: 0:00:52  lr: 6.348909657320873e-05  sample/s: 37.63234855400959  loss: 0.0743 (0.0806)  time: 0.1079  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:05:51\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [100/535]  eta: 0:00:47  lr: 6.037383177570094e-05  sample/s: 38.11210634996718  loss: 0.0654 (0.0797)  time: 0.1089  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:05:56\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [150/535]  eta: 0:00:41  lr: 5.7258566978193154e-05  sample/s: 37.9776940228402  loss: 0.0665 (0.0787)  time: 0.1074  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:06:02\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [200/535]  eta: 0:00:36  lr: 5.414330218068536e-05  sample/s: 37.083358568053065  loss: 0.0682 (0.0784)  time: 0.1073  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:06:07\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [250/535]  eta: 0:00:30  lr: 5.1028037383177574e-05  sample/s: 36.69685728284455  loss: 0.0637 (0.0774)  time: 0.1080  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:06:13\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [300/535]  eta: 0:00:25  lr: 4.791277258566979e-05  sample/s: 37.88541711359156  loss: 0.0531 (0.0773)  time: 0.1065  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:06:18\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [350/535]  eta: 0:00:19  lr: 4.4797507788161994e-05  sample/s: 37.865323330534714  loss: 0.0429 (0.0766)  time: 0.1078  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:06:23\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [400/535]  eta: 0:00:14  lr: 4.168224299065421e-05  sample/s: 36.907476214046085  loss: 0.0381 (0.0747)  time: 0.1079  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:06:29\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [450/535]  eta: 0:00:09  lr: 3.856697819314642e-05  sample/s: 38.62611862810781  loss: 0.0421 (0.0740)  time: 0.1073  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:06:34\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [500/535]  eta: 0:00:03  lr: 3.545171339563863e-05  sample/s: 37.41857321920038  loss: 0.0269 (0.0725)  time: 0.1076  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:06:38\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:57\n",
      "2021/06/02 03:06:38\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
      "2021/06/02 03:06:38\tINFO\t__main__\tValidation: matthews_correlation = 0.5722723257874831\n",
      "2021/06/02 03:06:38\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:06:40\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [  0/535]  eta: 0:01:06  lr: 3.327102803738318e-05  sample/s: 32.83983714375196  loss: 0.0766 (0.0766)  time: 0.1244  data: 0.0026  max mem: 2915\n",
      "2021/06/02 03:06:45\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 50/535]  eta: 0:00:52  lr: 3.015576323987539e-05  sample/s: 38.129949704659765  loss: 0.0096 (0.0380)  time: 0.1077  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:06:51\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [100/535]  eta: 0:00:47  lr: 2.7040498442367603e-05  sample/s: 34.489866149712505  loss: 0.0283 (0.0358)  time: 0.1075  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:06:56\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [150/535]  eta: 0:00:41  lr: 2.3925233644859816e-05  sample/s: 37.44605021460281  loss: 0.0060 (0.0329)  time: 0.1086  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:07:01\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [200/535]  eta: 0:00:36  lr: 2.0809968847352026e-05  sample/s: 36.51125986925146  loss: 0.0018 (0.0339)  time: 0.1091  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:07:07\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [250/535]  eta: 0:00:30  lr: 1.769470404984424e-05  sample/s: 37.7540404426822  loss: 0.0270 (0.0341)  time: 0.1089  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:07:12\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [300/535]  eta: 0:00:25  lr: 1.457943925233645e-05  sample/s: 36.77762286816608  loss: 0.0005 (0.0329)  time: 0.1083  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:07:18\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [350/535]  eta: 0:00:20  lr: 1.1464174454828661e-05  sample/s: 37.82272670912472  loss: 0.0249 (0.0341)  time: 0.1089  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:07:23\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [400/535]  eta: 0:00:14  lr: 8.348909657320873e-06  sample/s: 37.62762209139333  loss: 0.0022 (0.0337)  time: 0.1087  data: 0.0017  max mem: 2915\n",
      "2021/06/02 03:07:29\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [450/535]  eta: 0:00:09  lr: 5.233644859813085e-06  sample/s: 37.48461937190555  loss: 0.0027 (0.0331)  time: 0.1084  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:07:34\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [500/535]  eta: 0:00:03  lr: 2.118380062305296e-06  sample/s: 37.50120926822508  loss: 0.0006 (0.0327)  time: 0.1083  data: 0.0016  max mem: 2915\n",
      "2021/06/02 03:07:38\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:58\n",
      "2021/06/02 03:07:38\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
      "2021/06/02 03:07:38\tINFO\t__main__\tValidation: matthews_correlation = 0.5884833204348178\n",
      "2021/06/02 03:07:38\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "tokenizer config file saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/02 03:07:40\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/02 03:07:41\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
      "2021/06/02 03:07:41\tINFO\t__main__\tTest: matthews_correlation = 0.6335324951654004\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"cola\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/cola/kd/cola-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 03:07:44\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/02 03:07:45\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
      "2021/06/02 03:07:45\tINFO\t__main__\tTest: matthews_correlation = 0.5884833204348178\n",
      "2021/06/02 03:07:45\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/02 03:07:45\tINFO\t__main__\tcola/test: 1063 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task cola \\\n",
    "  --run_log log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6MzjOFPY1w1r"
   },
   "source": [
    "### 4.2 SST-2 task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "acMDi9f3pd50",
    "outputId": "58084382-f96f-41bf-dc65-52835a3387cb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-02 03:07:49.381458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/02 03:07:51\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='sst2', test_only=False, world_size=1)\n",
      "2021/06/02 03:07:51\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/02 03:07:51\tINFO\tfilelock\tLock 140107769119120 acquired on /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpi2ux52_x\n",
      "Downloading: 100% 699/699 [00:00<00:00, 677kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850\n",
      "creating metadata file for /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850\n",
      "2021/06/02 03:07:51\tINFO\tfilelock\tLock 140107769119120 released on /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"sst2\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"sst2\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 03:07:52\tINFO\tfilelock\tLock 140107730479568 acquired on /root/.cache/huggingface/transformers/cf4611f3acb47cfbf00d78de63722bc0a2a7502aa4a951c63e37435c1445c61d.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpd0eteqr5\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.71MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/cf4611f3acb47cfbf00d78de63722bc0a2a7502aa4a951c63e37435c1445c61d.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/cf4611f3acb47cfbf00d78de63722bc0a2a7502aa4a951c63e37435c1445c61d.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 03:07:52\tINFO\tfilelock\tLock 140107730479568 released on /root/.cache/huggingface/transformers/cf4611f3acb47cfbf00d78de63722bc0a2a7502aa4a951c63e37435c1445c61d.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 03:07:53\tINFO\tfilelock\tLock 140107730466832 acquired on /root/.cache/huggingface/transformers/b6e863385446791007ed96eb9641f7dda5ec85d34312b94cbf0228373105d6ba.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpejybyrte\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.48MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/b6e863385446791007ed96eb9641f7dda5ec85d34312b94cbf0228373105d6ba.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "creating metadata file for /root/.cache/huggingface/transformers/b6e863385446791007ed96eb9641f7dda5ec85d34312b94cbf0228373105d6ba.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "2021/06/02 03:07:53\tINFO\tfilelock\tLock 140107730466832 released on /root/.cache/huggingface/transformers/b6e863385446791007ed96eb9641f7dda5ec85d34312b94cbf0228373105d6ba.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829.lock\n",
      "2021/06/02 03:07:54\tINFO\tfilelock\tLock 140107730464912 acquired on /root/.cache/huggingface/transformers/af1346571ee3d0f30e5a92eb5b4f4bbfd07a434abe5bc7e6acd1fd65ecb1f83c.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpcmqhoogh\n",
      "Downloading: 100% 112/112 [00:00<00:00, 106kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/af1346571ee3d0f30e5a92eb5b4f4bbfd07a434abe5bc7e6acd1fd65ecb1f83c.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/af1346571ee3d0f30e5a92eb5b4f4bbfd07a434abe5bc7e6acd1fd65ecb1f83c.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/02 03:07:54\tINFO\tfilelock\tLock 140107730464912 released on /root/.cache/huggingface/transformers/af1346571ee3d0f30e5a92eb5b4f4bbfd07a434abe5bc7e6acd1fd65ecb1f83c.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/02 03:07:54\tINFO\tfilelock\tLock 140107730464912 acquired on /root/.cache/huggingface/transformers/2cc146e82436120bb043792e6351bde0ecccd6f1e99bc5c6461d608ea69d8559.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp48mqaad8\n",
      "Downloading: 100% 304/304 [00:00<00:00, 288kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/2cc146e82436120bb043792e6351bde0ecccd6f1e99bc5c6461d608ea69d8559.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/2cc146e82436120bb043792e6351bde0ecccd6f1e99bc5c6461d608ea69d8559.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:07:55\tINFO\tfilelock\tLock 140107730464912 released on /root/.cache/huggingface/transformers/2cc146e82436120bb043792e6351bde0ecccd6f1e99bc5c6461d608ea69d8559.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/cf4611f3acb47cfbf00d78de63722bc0a2a7502aa4a951c63e37435c1445c61d.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/b6e863385446791007ed96eb9641f7dda5ec85d34312b94cbf0228373105d6ba.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/af1346571ee3d0f30e5a92eb5b4f4bbfd07a434abe5bc7e6acd1fd65ecb1f83c.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/2cc146e82436120bb043792e6351bde0ecccd6f1e99bc5c6461d608ea69d8559.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:07:55\tINFO\tfilelock\tLock 140107730885840 acquired on /root/.cache/huggingface/transformers/bbc464a17228be840c065675c19d6b3327bf27a6653b037e809505da8684251a.6620f90c7fd16cd27354e810a7a41714fc971691934aee3346898765921685c8.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp6y6_nksd\n",
      "Downloading: 100% 1.34G/1.34G [00:39<00:00, 33.7MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/bbc464a17228be840c065675c19d6b3327bf27a6653b037e809505da8684251a.6620f90c7fd16cd27354e810a7a41714fc971691934aee3346898765921685c8\n",
      "creating metadata file for /root/.cache/huggingface/transformers/bbc464a17228be840c065675c19d6b3327bf27a6653b037e809505da8684251a.6620f90c7fd16cd27354e810a7a41714fc971691934aee3346898765921685c8\n",
      "2021/06/02 03:08:35\tINFO\tfilelock\tLock 140107730885840 released on /root/.cache/huggingface/transformers/bbc464a17228be840c065675c19d6b3327bf27a6653b037e809505da8684251a.6620f90c7fd16cd27354e810a7a41714fc971691934aee3346898765921685c8.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/bbc464a17228be840c065675c19d6b3327bf27a6653b037e809505da8684251a.6620f90c7fd16cd27354e810a7a41714fc971691934aee3346898765921685c8\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-sst2.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"sst2\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading and preparing dataset glue/sst2 (download: 7.09 MiB, generated: 4.81 MiB, post-processed: Unknown size, total: 11.90 MiB) to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 7.44M/7.44M [00:00<00:00, 28.1MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 68/68 [00:02<00:00, 26.27ba/s]\n",
      "100% 1/1 [00:00<00:00, 18.95ba/s]\n",
      "100% 2/2 [00:00<00:00,  9.22ba/s]\n",
      "2021/06/02 03:08:50\tINFO\t__main__\tStart training\n",
      "2021/06/02 03:08:50\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/02 03:08:50\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/02 03:08:50\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/02 03:08:50\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/02 03:08:50\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/02 03:08:50\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/02 03:08:54\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [   0/2105]  eta: 0:06:09  lr: 9.998416468725257e-05  sample/s: 23.70242446247459  loss: 0.3713 (0.3713)  time: 0.1756  data: 0.0068  max mem: 1761\n",
      "2021/06/02 03:10:28\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 500/2105]  eta: 0:05:01  lr: 9.20665083135392e-05  sample/s: 19.191068609732103  loss: 0.0478 (0.0943)  time: 0.1949  data: 0.0032  max mem: 3804\n",
      "2021/06/02 03:12:04\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1000/2105]  eta: 0:03:30  lr: 8.414885193982581e-05  sample/s: 18.831824554353645  loss: 0.0276 (0.0728)  time: 0.1969  data: 0.0031  max mem: 3807\n",
      "2021/06/02 03:13:44\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1500/2105]  eta: 0:01:56  lr: 7.623119556611244e-05  sample/s: 21.59249311126355  loss: 0.0269 (0.0609)  time: 0.2002  data: 0.0032  max mem: 3807\n",
      "2021/06/02 03:15:23\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [2000/2105]  eta: 0:00:20  lr: 6.831353919239906e-05  sample/s: 16.62580479572533  loss: 0.0272 (0.0534)  time: 0.2020  data: 0.0031  max mem: 3807\n",
      "2021/06/02 03:15:44\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:06:49\n",
      "2021/06/02 03:15:45\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
      "2021/06/02 03:15:45\tINFO\t__main__\tValidation: accuracy = 0.9174311926605505\n",
      "2021/06/02 03:15:45\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:15:46\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [   0/2105]  eta: 0:07:27  lr: 6.665083135391924e-05  sample/s: 19.45861285084667  loss: 0.0151 (0.0151)  time: 0.2125  data: 0.0070  max mem: 3807\n",
      "2021/06/02 03:17:26\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 500/2105]  eta: 0:05:22  lr: 5.8733174980205864e-05  sample/s: 22.36395579786454  loss: 0.0079 (0.0149)  time: 0.1920  data: 0.0032  max mem: 3807\n",
      "2021/06/02 03:19:07\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1000/2105]  eta: 0:03:41  lr: 5.081551860649249e-05  sample/s: 22.3488518003935  loss: 0.0066 (0.0143)  time: 0.1966  data: 0.0032  max mem: 3807\n",
      "2021/06/02 03:20:46\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1500/2105]  eta: 0:02:00  lr: 4.28978622327791e-05  sample/s: 18.662605745432298  loss: 0.0057 (0.0137)  time: 0.1867  data: 0.0032  max mem: 3807\n",
      "2021/06/02 03:22:25\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [2000/2105]  eta: 0:00:20  lr: 3.4980205859065716e-05  sample/s: 22.408462668625617  loss: 0.0034 (0.0132)  time: 0.2034  data: 0.0031  max mem: 3807\n",
      "2021/06/02 03:22:45\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:06:59\n",
      "2021/06/02 03:22:46\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
      "2021/06/02 03:22:46\tINFO\t__main__\tValidation: accuracy = 0.9243119266055045\n",
      "2021/06/02 03:22:46\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:22:48\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [   0/2105]  eta: 0:07:32  lr: 3.3317498020585904e-05  sample/s: 19.311792162347267  loss: 0.0022 (0.0022)  time: 0.2150  data: 0.0079  max mem: 3807\n",
      "2021/06/02 03:24:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 500/2105]  eta: 0:05:21  lr: 2.5399841646872525e-05  sample/s: 18.884605816928502  loss: 0.0032 (0.0058)  time: 0.1991  data: 0.0032  max mem: 3807\n",
      "2021/06/02 03:26:08\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1000/2105]  eta: 0:03:41  lr: 1.7482185273159146e-05  sample/s: 16.59830289311511  loss: 0.0028 (0.0057)  time: 0.1937  data: 0.0033  max mem: 3807\n",
      "2021/06/02 03:27:46\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1500/2105]  eta: 0:02:00  lr: 9.564528899445763e-06  sample/s: 22.247674413978842  loss: 0.0031 (0.0055)  time: 0.1884  data: 0.0031  max mem: 3807\n",
      "2021/06/02 03:29:26\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [2000/2105]  eta: 0:00:20  lr: 1.6468725257323833e-06  sample/s: 16.867813023750564  loss: 0.0035 (0.0055)  time: 0.2050  data: 0.0031  max mem: 3807\n",
      "2021/06/02 03:29:46\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:06:58\n",
      "2021/06/02 03:29:47\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
      "2021/06/02 03:29:47\tINFO\t__main__\tValidation: accuracy = 0.9243119266055045\n",
      "tokenizer config file saved in ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/02 03:29:47\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/02 03:29:50\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
      "2021/06/02 03:29:50\tINFO\t__main__\tTest: accuracy = 0.9346330275229358\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"sst2\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/sst2/kd/sst2-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 03:29:53\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/02 03:29:54\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
      "2021/06/02 03:29:54\tINFO\t__main__\tTest: accuracy = 0.9243119266055045\n",
      "2021/06/02 03:29:54\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/02 03:29:54\tINFO\t__main__\tsst2/test: 1821 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task sst2 \\\n",
    "  --run_log log/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pjKsN2wz10Lb"
   },
   "source": [
    "### 4.3 MRPC task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "NTHMMfEWpsdN",
    "outputId": "fcb83519-fcb4-426d-f6bc-4345894fd5d9"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-02 03:30:00.642949: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/02 03:30:02\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)\n",
      "2021/06/02 03:30:02\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/02 03:30:03\tINFO\tfilelock\tLock 139909012588176 acquired on /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpaz9xkayg\n",
      "Downloading: 100% 699/699 [00:00<00:00, 537kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f\n",
      "creating metadata file for /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f\n",
      "2021/06/02 03:30:03\tINFO\tfilelock\tLock 139909012588176 released on /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mrpc\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mrpc\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 03:30:04\tINFO\tfilelock\tLock 139909046059088 acquired on /root/.cache/huggingface/transformers/064ac7244ee80f0023f1a8e14a56258ce8ebaa9369994b5a6489cf694e4328e8.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpg1sc_jut\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.74MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/064ac7244ee80f0023f1a8e14a56258ce8ebaa9369994b5a6489cf694e4328e8.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/064ac7244ee80f0023f1a8e14a56258ce8ebaa9369994b5a6489cf694e4328e8.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 03:30:04\tINFO\tfilelock\tLock 139909046059088 released on /root/.cache/huggingface/transformers/064ac7244ee80f0023f1a8e14a56258ce8ebaa9369994b5a6489cf694e4328e8.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 03:30:04\tINFO\tfilelock\tLock 139908972624592 acquired on /root/.cache/huggingface/transformers/7fd4415825305382cd5e0693f06a8111acf2f578138c59f7dfb8a8275cef2262.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpsg1wckaw\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.62MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/7fd4415825305382cd5e0693f06a8111acf2f578138c59f7dfb8a8275cef2262.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "creating metadata file for /root/.cache/huggingface/transformers/7fd4415825305382cd5e0693f06a8111acf2f578138c59f7dfb8a8275cef2262.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "2021/06/02 03:30:05\tINFO\tfilelock\tLock 139908972624592 released on /root/.cache/huggingface/transformers/7fd4415825305382cd5e0693f06a8111acf2f578138c59f7dfb8a8275cef2262.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "2021/06/02 03:30:06\tINFO\tfilelock\tLock 139908972623184 acquired on /root/.cache/huggingface/transformers/594b3c1c958a692246804abd873d23f5c5d3d288eac4c0826762f7a411c91b87.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp0mzw5jru\n",
      "Downloading: 100% 112/112 [00:00<00:00, 111kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/594b3c1c958a692246804abd873d23f5c5d3d288eac4c0826762f7a411c91b87.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/594b3c1c958a692246804abd873d23f5c5d3d288eac4c0826762f7a411c91b87.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/02 03:30:06\tINFO\tfilelock\tLock 139908972623184 released on /root/.cache/huggingface/transformers/594b3c1c958a692246804abd873d23f5c5d3d288eac4c0826762f7a411c91b87.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/02 03:30:06\tINFO\tfilelock\tLock 139908972638672 acquired on /root/.cache/huggingface/transformers/d42b7749674affce7d3a6f7bae5707c60be36ab33525ef6f45d4fd09a42e683b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpul1lycep\n",
      "Downloading: 100% 304/304 [00:00<00:00, 288kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/d42b7749674affce7d3a6f7bae5707c60be36ab33525ef6f45d4fd09a42e683b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/d42b7749674affce7d3a6f7bae5707c60be36ab33525ef6f45d4fd09a42e683b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:30:06\tINFO\tfilelock\tLock 139908972638672 released on /root/.cache/huggingface/transformers/d42b7749674affce7d3a6f7bae5707c60be36ab33525ef6f45d4fd09a42e683b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/064ac7244ee80f0023f1a8e14a56258ce8ebaa9369994b5a6489cf694e4328e8.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/7fd4415825305382cd5e0693f06a8111acf2f578138c59f7dfb8a8275cef2262.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/594b3c1c958a692246804abd873d23f5c5d3d288eac4c0826762f7a411c91b87.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/d42b7749674affce7d3a6f7bae5707c60be36ab33525ef6f45d4fd09a42e683b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:30:07\tINFO\tfilelock\tLock 139908972695440 acquired on /root/.cache/huggingface/transformers/e0ce648bb6909aa61e161400f3667d6dbae9195989c03387cffbd48eb35bd3fb.cd2fe4d95f6f6a6ada6afd35c4ee4e2b20fd853695f0c8c64cc6bceb322f1960.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprj4yhxdo\n",
      "Downloading: 100% 1.34G/1.34G [00:32<00:00, 40.8MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/e0ce648bb6909aa61e161400f3667d6dbae9195989c03387cffbd48eb35bd3fb.cd2fe4d95f6f6a6ada6afd35c4ee4e2b20fd853695f0c8c64cc6bceb322f1960\n",
      "creating metadata file for /root/.cache/huggingface/transformers/e0ce648bb6909aa61e161400f3667d6dbae9195989c03387cffbd48eb35bd3fb.cd2fe4d95f6f6a6ada6afd35c4ee4e2b20fd853695f0c8c64cc6bceb322f1960\n",
      "2021/06/02 03:30:40\tINFO\tfilelock\tLock 139908972695440 released on /root/.cache/huggingface/transformers/e0ce648bb6909aa61e161400f3667d6dbae9195989c03387cffbd48eb35bd3fb.cd2fe4d95f6f6a6ada6afd35c4ee4e2b20fd853695f0c8c64cc6bceb322f1960.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/e0ce648bb6909aa61e161400f3667d6dbae9195989c03387cffbd48eb35bd3fb.cd2fe4d95f6f6a6ada6afd35c4ee4e2b20fd853695f0c8c64cc6bceb322f1960\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-mrpc.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mrpc\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading and preparing dataset glue/mrpc (download: 1.43 MiB, generated: 1.43 MiB, post-processed: Unknown size, total: 2.85 MiB) to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 6.22kB [00:00, 4.78MB/s]\n",
      "Downloading: 1.05MB [00:00, 6.85MB/s]\n",
      "Downloading: 441kB [00:00, 4.63MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 4/4 [00:00<00:00,  7.74ba/s]\n",
      "100% 1/1 [00:00<00:00, 20.21ba/s]\n",
      "100% 2/2 [00:00<00:00,  4.06ba/s]\n",
      "2021/06/02 03:30:53\tINFO\t__main__\tStart training\n",
      "2021/06/02 03:30:53\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/02 03:30:53\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/02 03:30:53\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/02 03:30:53\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/02 03:30:53\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/02 03:30:53\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/02 03:30:56\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [  0/230]  eta: 0:00:46  lr: 9.991304347826087e-05  sample/s: 20.392326934249965  loss: 0.1503 (0.1503)  time: 0.2009  data: 0.0048  max mem: 1977\n",
      "2021/06/02 03:31:06\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 50/230]  eta: 0:00:32  lr: 9.556521739130435e-05  sample/s: 17.23837706319458  loss: 0.1290 (0.1382)  time: 0.1803  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:31:15\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [100/230]  eta: 0:00:24  lr: 9.121739130434783e-05  sample/s: 21.91641901009655  loss: 0.1001 (0.1266)  time: 0.1900  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:31:25\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [150/230]  eta: 0:00:14  lr: 8.686956521739131e-05  sample/s: 21.776035636593964  loss: 0.0968 (0.1191)  time: 0.1883  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:31:34\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [200/230]  eta: 0:00:05  lr: 8.252173913043479e-05  sample/s: 18.693318536642977  loss: 0.0793 (0.1138)  time: 0.1928  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:31:40\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:43\n",
      "2021/06/02 03:31:40\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:31:40\tINFO\t__main__\tValidation: accuracy = 0.7034313725490197, f1 = 0.7280898876404496\n",
      "2021/06/02 03:31:40\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:31:42\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [  0/230]  eta: 0:00:49  lr: 7.991304347826087e-05  sample/s: 18.857097256050615  loss: 0.1572 (0.1572)  time: 0.2153  data: 0.0032  max mem: 3526\n",
      "2021/06/02 03:31:51\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 50/230]  eta: 0:00:35  lr: 7.556521739130435e-05  sample/s: 20.82654123979448  loss: 0.0390 (0.0649)  time: 0.2016  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:32:01\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [100/230]  eta: 0:00:25  lr: 7.121739130434783e-05  sample/s: 18.69269370983412  loss: 0.0597 (0.0700)  time: 0.1927  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:32:11\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [150/230]  eta: 0:00:15  lr: 6.686956521739131e-05  sample/s: 19.015017329431473  loss: 0.0377 (0.0645)  time: 0.1981  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:32:21\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [200/230]  eta: 0:00:05  lr: 6.252173913043479e-05  sample/s: 21.69064626356052  loss: 0.0610 (0.0606)  time: 0.1898  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:32:26\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:44\n",
      "2021/06/02 03:32:27\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:32:27\tINFO\t__main__\tValidation: accuracy = 0.8602941176470589, f1 = 0.9048414023372287\n",
      "2021/06/02 03:32:27\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:32:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [  0/230]  eta: 0:00:48  lr: 5.9913043478260875e-05  sample/s: 19.17352289090535  loss: 0.0641 (0.0641)  time: 0.2117  data: 0.0031  max mem: 3526\n",
      "2021/06/02 03:32:38\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 50/230]  eta: 0:00:34  lr: 5.556521739130435e-05  sample/s: 21.712120427791984  loss: 0.0400 (0.0358)  time: 0.1891  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:32:47\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [100/230]  eta: 0:00:24  lr: 5.1217391304347826e-05  sample/s: 22.38511183072019  loss: 0.0263 (0.0301)  time: 0.1942  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:32:57\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [150/230]  eta: 0:00:15  lr: 4.686956521739131e-05  sample/s: 22.787513955933207  loss: 0.0181 (0.0285)  time: 0.1968  data: 0.0026  max mem: 3526\n",
      "2021/06/02 03:33:07\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [200/230]  eta: 0:00:05  lr: 4.252173913043478e-05  sample/s: 22.128625845459055  loss: 0.0157 (0.0291)  time: 0.1970  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:33:12\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:44\n",
      "2021/06/02 03:33:13\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:33:13\tINFO\t__main__\tValidation: accuracy = 0.8504901960784313, f1 = 0.896434634974533\n",
      "2021/06/02 03:33:13\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [  0/230]  eta: 0:00:51  lr: 3.991304347826087e-05  sample/s: 18.02929655806816  loss: 0.0000 (0.0000)  time: 0.2245  data: 0.0026  max mem: 3526\n",
      "2021/06/02 03:33:23\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [ 50/230]  eta: 0:00:34  lr: 3.556521739130435e-05  sample/s: 21.42788028026903  loss: 0.0001 (0.0126)  time: 0.1886  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:33:33\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [100/230]  eta: 0:00:25  lr: 3.121739130434783e-05  sample/s: 22.36166058216608  loss: 0.0005 (0.0141)  time: 0.1932  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:33:42\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [150/230]  eta: 0:00:15  lr: 2.6869565217391306e-05  sample/s: 22.610681344035882  loss: 0.0001 (0.0132)  time: 0.1855  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:33:51\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [200/230]  eta: 0:00:05  lr: 2.252173913043478e-05  sample/s: 21.671621740321072  loss: 0.0002 (0.0136)  time: 0.1847  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:33:57\tINFO\ttorchdistill.misc.log\tEpoch: [3] Total time: 0:00:44\n",
      "2021/06/02 03:33:58\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:33:58\tINFO\t__main__\tValidation: accuracy = 0.8578431372549019, f1 = 0.9016949152542373\n",
      "2021/06/02 03:33:58\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [  0/230]  eta: 0:00:42  lr: 1.9913043478260872e-05  sample/s: 21.735101276728052  loss: 0.0009 (0.0009)  time: 0.1864  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:34:08\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [ 50/230]  eta: 0:00:34  lr: 1.5565217391304347e-05  sample/s: 21.783980793676143  loss: 0.0001 (0.0062)  time: 0.1912  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:34:17\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [100/230]  eta: 0:00:24  lr: 1.1217391304347827e-05  sample/s: 18.40484880397997  loss: 0.0001 (0.0075)  time: 0.1925  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:34:27\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [150/230]  eta: 0:00:15  lr: 6.869565217391305e-06  sample/s: 22.9353915726474  loss: 0.0001 (0.0062)  time: 0.1951  data: 0.0024  max mem: 3526\n",
      "2021/06/02 03:34:37\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [200/230]  eta: 0:00:05  lr: 2.5217391304347826e-06  sample/s: 21.496298384306893  loss: 0.0001 (0.0073)  time: 0.1935  data: 0.0025  max mem: 3526\n",
      "2021/06/02 03:34:42\tINFO\ttorchdistill.misc.log\tEpoch: [4] Total time: 0:00:44\n",
      "2021/06/02 03:34:43\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:34:43\tINFO\t__main__\tValidation: accuracy = 0.8676470588235294, f1 = 0.9049295774647886\n",
      "2021/06/02 03:34:43\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "tokenizer config file saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/02 03:34:44\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/02 03:34:46\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:34:46\tINFO\t__main__\tTest: accuracy = 0.8799019607843137, f1 = 0.9162393162393162\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mrpc\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/mrpc/kd/mrpc-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 03:34:49\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/02 03:34:50\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
      "2021/06/02 03:34:50\tINFO\t__main__\tTest: accuracy = 0.8676470588235294, f1 = 0.9049295774647886\n",
      "2021/06/02 03:34:50\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/02 03:34:50\tINFO\t__main__\tmrpc/test: 1725 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task mrpc \\\n",
    "  --run_log log/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oCFuvFRv14Ky"
   },
   "source": [
    "### 4.4 STS-B task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "tDUYbl2fpxu8",
    "outputId": "fbf3715b-78ac-45cc-8aed-86d3b800345c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-02 03:34:57.164794: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/02 03:34:58\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1)\n",
      "2021/06/02 03:34:59\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/02 03:34:59\tINFO\tfilelock\tLock 139701413451152 acquired on /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpwpbkq2lg\n",
      "Downloading: 100% 760/760 [00:00<00:00, 702kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53\n",
      "creating metadata file for /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53\n",
      "2021/06/02 03:34:59\tINFO\tfilelock\tLock 139701413451152 released on /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"stsb\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"regression\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"stsb\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"regression\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 03:35:00\tINFO\tfilelock\tLock 139701413451856 acquired on /root/.cache/huggingface/transformers/c33cb68338f70fc33e877d6a8d67e166d9378af1730f095a407ebbf0a5bc11dd.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp9cili1_k\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.74MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/c33cb68338f70fc33e877d6a8d67e166d9378af1730f095a407ebbf0a5bc11dd.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/c33cb68338f70fc33e877d6a8d67e166d9378af1730f095a407ebbf0a5bc11dd.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 03:35:00\tINFO\tfilelock\tLock 139701413451856 released on /root/.cache/huggingface/transformers/c33cb68338f70fc33e877d6a8d67e166d9378af1730f095a407ebbf0a5bc11dd.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 03:35:01\tINFO\tfilelock\tLock 139701413451856 acquired on /root/.cache/huggingface/transformers/47f53c5bca866c54a08d55d7873a2b244cf5f3d7fd50de84fd81b7e8a81cda71.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpq6639wco\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.30MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/47f53c5bca866c54a08d55d7873a2b244cf5f3d7fd50de84fd81b7e8a81cda71.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "creating metadata file for /root/.cache/huggingface/transformers/47f53c5bca866c54a08d55d7873a2b244cf5f3d7fd50de84fd81b7e8a81cda71.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "2021/06/02 03:35:01\tINFO\tfilelock\tLock 139701413451856 released on /root/.cache/huggingface/transformers/47f53c5bca866c54a08d55d7873a2b244cf5f3d7fd50de84fd81b7e8a81cda71.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "2021/06/02 03:35:02\tINFO\tfilelock\tLock 139701413047952 acquired on /root/.cache/huggingface/transformers/a7b5c9cb597c41920ba65a959739a6d6cdae44cb22e58f8324a913a01e2c1038.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpkw9x1iov\n",
      "Downloading: 100% 112/112 [00:00<00:00, 97.5kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/a7b5c9cb597c41920ba65a959739a6d6cdae44cb22e58f8324a913a01e2c1038.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/a7b5c9cb597c41920ba65a959739a6d6cdae44cb22e58f8324a913a01e2c1038.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/02 03:35:02\tINFO\tfilelock\tLock 139701413047952 released on /root/.cache/huggingface/transformers/a7b5c9cb597c41920ba65a959739a6d6cdae44cb22e58f8324a913a01e2c1038.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/02 03:35:02\tINFO\tfilelock\tLock 139701413102864 acquired on /root/.cache/huggingface/transformers/3c2bf518495f2a17ff3f7ff4c7eb135ac69e9dd3ee5980e60ebb78563306f662.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp_fg2zniu\n",
      "Downloading: 100% 304/304 [00:00<00:00, 245kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/3c2bf518495f2a17ff3f7ff4c7eb135ac69e9dd3ee5980e60ebb78563306f662.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/3c2bf518495f2a17ff3f7ff4c7eb135ac69e9dd3ee5980e60ebb78563306f662.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:35:03\tINFO\tfilelock\tLock 139701413102864 released on /root/.cache/huggingface/transformers/3c2bf518495f2a17ff3f7ff4c7eb135ac69e9dd3ee5980e60ebb78563306f662.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/c33cb68338f70fc33e877d6a8d67e166d9378af1730f095a407ebbf0a5bc11dd.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/47f53c5bca866c54a08d55d7873a2b244cf5f3d7fd50de84fd81b7e8a81cda71.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/a7b5c9cb597c41920ba65a959739a6d6cdae44cb22e58f8324a913a01e2c1038.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/3c2bf518495f2a17ff3f7ff4c7eb135ac69e9dd3ee5980e60ebb78563306f662.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:35:03\tINFO\tfilelock\tLock 139701453001168 acquired on /root/.cache/huggingface/transformers/e991cf8cddb711d4732ceb62de08766bf1f498225a38105d29088cad5ce16619.13b6232a169dd237d3c1f0597f446e2e2a8183ebeeaeba888969639d094dd93d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpdtygeolc\n",
      "Downloading: 100% 1.34G/1.34G [00:32<00:00, 40.7MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/e991cf8cddb711d4732ceb62de08766bf1f498225a38105d29088cad5ce16619.13b6232a169dd237d3c1f0597f446e2e2a8183ebeeaeba888969639d094dd93d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/e991cf8cddb711d4732ceb62de08766bf1f498225a38105d29088cad5ce16619.13b6232a169dd237d3c1f0597f446e2e2a8183ebeeaeba888969639d094dd93d\n",
      "2021/06/02 03:35:36\tINFO\tfilelock\tLock 139701453001168 released on /root/.cache/huggingface/transformers/e991cf8cddb711d4732ceb62de08766bf1f498225a38105d29088cad5ce16619.13b6232a169dd237d3c1f0597f446e2e2a8183ebeeaeba888969639d094dd93d.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/e991cf8cddb711d4732ceb62de08766bf1f498225a38105d29088cad5ce16619.13b6232a169dd237d3c1f0597f446e2e2a8183ebeeaeba888969639d094dd93d\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-stsb.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"stsb\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading and preparing dataset glue/stsb (download: 784.05 KiB, generated: 1.09 MiB, post-processed: Unknown size, total: 1.86 MiB) to /root/.cache/huggingface/datasets/glue/stsb/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 803k/803k [00:00<00:00, 6.43MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/stsb/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 6/6 [00:00<00:00,  7.59ba/s]\n",
      "100% 2/2 [00:00<00:00, 19.25ba/s]\n",
      "100% 2/2 [00:00<00:00, 22.14ba/s]\n",
      "2021/06/02 03:35:47\tINFO\t__main__\tStart training\n",
      "2021/06/02 03:35:47\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/02 03:35:47\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/02 03:35:47\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/02 03:35:47\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/02 03:35:47\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss + 1.0 * MSELoss()\n",
      "2021/06/02 03:35:47\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/02 03:35:51\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [  0/180]  eta: 0:01:04  lr: 2.9944444444444443e-05  sample/s: 11.421377859137019  loss: 344.6131 (344.6131)  time: 0.3558  data: 0.0056  max mem: 2862\n",
      "2021/06/02 03:36:05\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 50/180]  eta: 0:00:36  lr: 2.716666666666667e-05  sample/s: 11.839344903191297  loss: 32.5752 (136.4970)  time: 0.2917  data: 0.0037  max mem: 4509\n",
      "2021/06/02 03:36:20\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [100/180]  eta: 0:00:22  lr: 2.438888888888889e-05  sample/s: 12.482360346230937  loss: 14.3937 (77.4924)  time: 0.3001  data: 0.0036  max mem: 4512\n",
      "2021/06/02 03:36:35\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [150/180]  eta: 0:00:08  lr: 2.161111111111111e-05  sample/s: 18.457554881282167  loss: 14.4881 (56.5892)  time: 0.2937  data: 0.0035  max mem: 5074\n",
      "2021/06/02 03:36:44\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:52\n",
      "2021/06/02 03:36:46\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
      "2021/06/02 03:36:46\tINFO\t__main__\tValidation: pearson = 0.8887942064423422, spearmanr = 0.8855088032881668\n",
      "2021/06/02 03:36:46\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:36:47\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [  0/180]  eta: 0:00:53  lr: 1.9944444444444447e-05  sample/s: 13.577578225655037  loss: 3.8488 (3.8488)  time: 0.2998  data: 0.0052  max mem: 5074\n",
      "2021/06/02 03:37:02\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 50/180]  eta: 0:00:39  lr: 1.7166666666666666e-05  sample/s: 15.136700409606814  loss: 5.3039 (6.4270)  time: 0.2907  data: 0.0037  max mem: 5074\n",
      "2021/06/02 03:37:17\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [100/180]  eta: 0:00:23  lr: 1.438888888888889e-05  sample/s: 13.321721344017888  loss: 5.2473 (6.1499)  time: 0.2922  data: 0.0036  max mem: 5074\n",
      "2021/06/02 03:37:32\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [150/180]  eta: 0:00:09  lr: 1.161111111111111e-05  sample/s: 13.83419790803432  loss: 5.4968 (5.9879)  time: 0.2930  data: 0.0036  max mem: 5077\n",
      "2021/06/02 03:37:40\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:53\n",
      "2021/06/02 03:37:42\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
      "2021/06/02 03:37:42\tINFO\t__main__\tValidation: pearson = 0.8969660575803039, spearmanr = 0.892996418766108\n",
      "2021/06/02 03:37:42\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 03:37:44\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [  0/180]  eta: 0:00:51  lr: 9.944444444444445e-06  sample/s: 14.27816575973638  loss: 2.7665 (2.7665)  time: 0.2857  data: 0.0055  max mem: 5077\n",
      "2021/06/02 03:37:59\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 50/180]  eta: 0:00:39  lr: 7.166666666666667e-06  sample/s: 12.555963762774146  loss: 1.7601 (2.2307)  time: 0.2996  data: 0.0036  max mem: 5077\n",
      "2021/06/02 03:38:14\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [100/180]  eta: 0:00:23  lr: 4.388888888888889e-06  sample/s: 15.292400384652195  loss: 1.9188 (2.2212)  time: 0.3016  data: 0.0036  max mem: 5077\n",
      "2021/06/02 03:38:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [150/180]  eta: 0:00:08  lr: 1.6111111111111111e-06  sample/s: 13.673106372089194  loss: 1.5537 (2.0758)  time: 0.2988  data: 0.0036  max mem: 5077\n",
      "2021/06/02 03:38:37\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:53\n",
      "2021/06/02 03:38:39\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
      "2021/06/02 03:38:39\tINFO\t__main__\tValidation: pearson = 0.8992889526709605, spearmanr = 0.8953464781265897\n",
      "2021/06/02 03:38:39\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "tokenizer config file saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/02 03:38:40\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/02 03:38:44\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
      "2021/06/02 03:38:44\tINFO\t__main__\tTest: pearson = 0.9034280099511216, spearmanr = 0.901079787020528\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"stsb\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/stsb/kd/stsb-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 03:38:47\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/02 03:38:49\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
      "2021/06/02 03:38:49\tINFO\t__main__\tTest: pearson = 0.8992889526709605, spearmanr = 0.8953464781265897\n",
      "2021/06/02 03:38:49\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/02 03:38:49\tINFO\t__main__\tstsb/test: 1379 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task stsb \\\n",
    "  --run_log log/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sxS1o7i118Eq"
   },
   "source": [
    "### 4.5 QQP task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "UtA-gDQYp2Hf",
    "outputId": "f61a3ada-ffd8-4122-b739-2201e3acf8a7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-02 03:38:54.293623: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/02 03:38:56\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1)\n",
      "2021/06/02 03:38:56\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/02 03:38:56\tINFO\tfilelock\tLock 140443960170000 acquired on /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp9te7c3ab\n",
      "Downloading: 100% 698/698 [00:00<00:00, 603kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41\n",
      "creating metadata file for /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41\n",
      "2021/06/02 03:38:56\tINFO\tfilelock\tLock 140443960170000 released on /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qqp\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qqp\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 03:38:57\tINFO\tfilelock\tLock 140443960153872 acquired on /root/.cache/huggingface/transformers/3ee6604aa6258f1b4373e9fcadfca3fefec09a87b2759cd50d9cb6933d94860b.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpxn0ux6fh\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.74MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/3ee6604aa6258f1b4373e9fcadfca3fefec09a87b2759cd50d9cb6933d94860b.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/3ee6604aa6258f1b4373e9fcadfca3fefec09a87b2759cd50d9cb6933d94860b.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 03:38:57\tINFO\tfilelock\tLock 140443960153872 released on /root/.cache/huggingface/transformers/3ee6604aa6258f1b4373e9fcadfca3fefec09a87b2759cd50d9cb6933d94860b.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 03:38:58\tINFO\tfilelock\tLock 140443959680592 acquired on /root/.cache/huggingface/transformers/bbd1c7d170a568d80fe40c1392226fd011ecc648f9c223c0136c51ff0884c909.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp1icel1lq\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.34MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/bbd1c7d170a568d80fe40c1392226fd011ecc648f9c223c0136c51ff0884c909.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "creating metadata file for /root/.cache/huggingface/transformers/bbd1c7d170a568d80fe40c1392226fd011ecc648f9c223c0136c51ff0884c909.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "2021/06/02 03:38:58\tINFO\tfilelock\tLock 140443959680592 released on /root/.cache/huggingface/transformers/bbd1c7d170a568d80fe40c1392226fd011ecc648f9c223c0136c51ff0884c909.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829.lock\n",
      "2021/06/02 03:38:59\tINFO\tfilelock\tLock 140443961258256 acquired on /root/.cache/huggingface/transformers/178af5611f560962f8aff2195e99d14fa27d8d1d34bf73a805dc36ea8f67051e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprvl20m3p\n",
      "Downloading: 100% 112/112 [00:00<00:00, 107kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/178af5611f560962f8aff2195e99d14fa27d8d1d34bf73a805dc36ea8f67051e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/178af5611f560962f8aff2195e99d14fa27d8d1d34bf73a805dc36ea8f67051e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/02 03:38:59\tINFO\tfilelock\tLock 140443961258256 released on /root/.cache/huggingface/transformers/178af5611f560962f8aff2195e99d14fa27d8d1d34bf73a805dc36ea8f67051e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/02 03:38:59\tINFO\tfilelock\tLock 140443927836880 acquired on /root/.cache/huggingface/transformers/aed86e8623bdb9c9299ed2490a8251e31870ef9d5c7c357ea2f771e9d107de83.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp8vv1gywz\n",
      "Downloading: 100% 304/304 [00:00<00:00, 289kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/aed86e8623bdb9c9299ed2490a8251e31870ef9d5c7c357ea2f771e9d107de83.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/aed86e8623bdb9c9299ed2490a8251e31870ef9d5c7c357ea2f771e9d107de83.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:39:00\tINFO\tfilelock\tLock 140443927836880 released on /root/.cache/huggingface/transformers/aed86e8623bdb9c9299ed2490a8251e31870ef9d5c7c357ea2f771e9d107de83.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/3ee6604aa6258f1b4373e9fcadfca3fefec09a87b2759cd50d9cb6933d94860b.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/bbd1c7d170a568d80fe40c1392226fd011ecc648f9c223c0136c51ff0884c909.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/178af5611f560962f8aff2195e99d14fa27d8d1d34bf73a805dc36ea8f67051e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/aed86e8623bdb9c9299ed2490a8251e31870ef9d5c7c357ea2f771e9d107de83.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 03:39:00\tINFO\tfilelock\tLock 140443927789328 acquired on /root/.cache/huggingface/transformers/02b89a954d5801fe4697b7762202140277ff423972dcd3031ad702d2acdf53be.e1aab307d0e91d479c718b89c31b899cf5bb68246e073ef69be407f796cd3548.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpyzfra235\n",
      "Downloading: 100% 1.34G/1.34G [00:44<00:00, 30.1MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/02b89a954d5801fe4697b7762202140277ff423972dcd3031ad702d2acdf53be.e1aab307d0e91d479c718b89c31b899cf5bb68246e073ef69be407f796cd3548\n",
      "creating metadata file for /root/.cache/huggingface/transformers/02b89a954d5801fe4697b7762202140277ff423972dcd3031ad702d2acdf53be.e1aab307d0e91d479c718b89c31b899cf5bb68246e073ef69be407f796cd3548\n",
      "2021/06/02 03:39:45\tINFO\tfilelock\tLock 140443927789328 released on /root/.cache/huggingface/transformers/02b89a954d5801fe4697b7762202140277ff423972dcd3031ad702d2acdf53be.e1aab307d0e91d479c718b89c31b899cf5bb68246e073ef69be407f796cd3548.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/02b89a954d5801fe4697b7762202140277ff423972dcd3031ad702d2acdf53be.e1aab307d0e91d479c718b89c31b899cf5bb68246e073ef69be407f796cd3548\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-qqp.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qqp\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading and preparing dataset glue/qqp (download: 39.76 MiB, generated: 106.55 MiB, post-processed: Unknown size, total: 146.32 MiB) to /root/.cache/huggingface/datasets/glue/qqp/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 41.7M/41.7M [00:00<00:00, 51.2MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/qqp/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 364/364 [00:26<00:00, 13.56ba/s]\n",
      "100% 41/41 [00:02<00:00, 13.72ba/s]\n",
      "100% 391/391 [00:29<00:00, 13.38ba/s]\n",
      "2021/06/02 03:41:21\tINFO\t__main__\tStart training\n",
      "2021/06/02 03:41:21\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/02 03:41:21\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/02 03:41:21\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/02 03:41:21\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/02 03:41:21\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/02 03:41:21\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/02 03:41:26\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [    0/11371]  eta: 1:11:12  lr: 9.999706856623575e-05  sample/s: 11.353193666329894  loss: 0.1475 (0.1475)  time: 0.3757  data: 0.0234  max mem: 3034\n",
      "2021/06/02 03:46:21\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 1000/11371]  eta: 0:51:05  lr: 9.706563480198166e-05  sample/s: 12.375882057009319  loss: 0.0362 (0.0519)  time: 0.3043  data: 0.0038  max mem: 5081\n",
      "2021/06/02 03:51:18\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 2000/11371]  eta: 0:46:16  lr: 9.413420103772755e-05  sample/s: 18.223209446359153  loss: 0.0316 (0.0428)  time: 0.3065  data: 0.0037  max mem: 5081\n",
      "2021/06/02 03:56:18\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 3000/11371]  eta: 0:41:29  lr: 9.120276727347346e-05  sample/s: 15.257410120488064  loss: 0.0259 (0.0381)  time: 0.2740  data: 0.0035  max mem: 5081\n",
      "2021/06/02 04:01:13\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 4000/11371]  eta: 0:36:27  lr: 8.827133350921937e-05  sample/s: 12.820824841375188  loss: 0.0241 (0.0349)  time: 0.2839  data: 0.0037  max mem: 5081\n",
      "2021/06/02 04:06:10\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 5000/11371]  eta: 0:31:31  lr: 8.533989974496526e-05  sample/s: 13.749213058838858  loss: 0.0220 (0.0327)  time: 0.2984  data: 0.0037  max mem: 5081\n",
      "2021/06/02 04:11:09\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 6000/11371]  eta: 0:26:36  lr: 8.240846598071117e-05  sample/s: 12.221539411899256  loss: 0.0177 (0.0310)  time: 0.3063  data: 0.0037  max mem: 5081\n",
      "2021/06/02 04:16:04\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 7000/11371]  eta: 0:21:37  lr: 7.947703221645707e-05  sample/s: 13.1976139663051  loss: 0.0211 (0.0296)  time: 0.3185  data: 0.0038  max mem: 5081\n",
      "2021/06/02 04:21:03\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 8000/11371]  eta: 0:16:41  lr: 7.654559845220297e-05  sample/s: 15.387663796210779  loss: 0.0258 (0.0285)  time: 0.3008  data: 0.0038  max mem: 5081\n",
      "2021/06/02 04:26:00\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 9000/11371]  eta: 0:11:44  lr: 7.361416468794888e-05  sample/s: 12.713065183078058  loss: 0.0169 (0.0274)  time: 0.2899  data: 0.0037  max mem: 5081\n",
      "2021/06/02 04:30:59\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [10000/11371]  eta: 0:06:47  lr: 7.068273092369478e-05  sample/s: 16.37928135937401  loss: 0.0208 (0.0266)  time: 0.2874  data: 0.0038  max mem: 5081\n",
      "2021/06/02 04:35:56\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [11000/11371]  eta: 0:01:50  lr: 6.775129715944069e-05  sample/s: 8.062519402755552  loss: 0.0157 (0.0259)  time: 0.3090  data: 0.0038  max mem: 5081\n",
      "2021/06/02 04:37:47\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:56:22\n",
      "2021/06/02 04:38:53\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
      "2021/06/02 04:38:53\tINFO\t__main__\tValidation: accuracy = 0.8999752658916647, f1 = 0.8631564699512723\n",
      "2021/06/02 04:38:53\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 04:38:54\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [    0/11371]  eta: 0:48:17  lr: 6.666373523290241e-05  sample/s: 17.249064149818278  loss: 0.0076 (0.0076)  time: 0.2549  data: 0.0229  max mem: 5081\n",
      "2021/06/02 04:43:50\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 1000/11371]  eta: 0:51:12  lr: 6.373230146864831e-05  sample/s: 19.282161615465068  loss: 0.0099 (0.0103)  time: 0.2872  data: 0.0037  max mem: 5081\n",
      "2021/06/02 04:48:49\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 2000/11371]  eta: 0:46:26  lr: 6.0800867704394224e-05  sample/s: 14.906005504942543  loss: 0.0109 (0.0102)  time: 0.2830  data: 0.0037  max mem: 5081\n",
      "2021/06/02 04:53:49\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 3000/11371]  eta: 0:41:35  lr: 5.7869433940140126e-05  sample/s: 14.619049569241573  loss: 0.0092 (0.0102)  time: 0.3129  data: 0.0038  max mem: 5081\n",
      "2021/06/02 04:58:45\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 4000/11371]  eta: 0:36:34  lr: 5.493800017588603e-05  sample/s: 10.34794477795411  loss: 0.0070 (0.0100)  time: 0.2890  data: 0.0037  max mem: 5081\n",
      "2021/06/02 05:03:43\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 5000/11371]  eta: 0:31:37  lr: 5.200656641163193e-05  sample/s: 12.790037057620369  loss: 0.0098 (0.0100)  time: 0.2894  data: 0.0038  max mem: 5081\n",
      "2021/06/02 05:08:46\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 6000/11371]  eta: 0:26:43  lr: 4.907513264737784e-05  sample/s: 13.166211370808067  loss: 0.0107 (0.0100)  time: 0.2869  data: 0.0037  max mem: 5081\n",
      "2021/06/02 05:13:41\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 7000/11371]  eta: 0:21:43  lr: 4.614369888312374e-05  sample/s: 19.164937195286335  loss: 0.0095 (0.0099)  time: 0.2938  data: 0.0038  max mem: 5081\n",
      "2021/06/02 05:18:39\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 8000/11371]  eta: 0:16:44  lr: 4.3212265118869647e-05  sample/s: 15.181819333517934  loss: 0.0060 (0.0098)  time: 0.2930  data: 0.0038  max mem: 5081\n",
      "2021/06/02 05:23:38\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 9000/11371]  eta: 0:11:46  lr: 4.028083135461555e-05  sample/s: 14.830452200539572  loss: 0.0066 (0.0098)  time: 0.3023  data: 0.0038  max mem: 5081\n",
      "2021/06/02 05:28:32\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [10000/11371]  eta: 0:06:48  lr: 3.734939759036144e-05  sample/s: 18.68228607571256  loss: 0.0073 (0.0096)  time: 0.2791  data: 0.0037  max mem: 5081\n",
      "2021/06/02 05:33:28\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [11000/11371]  eta: 0:01:50  lr: 3.441796382610735e-05  sample/s: 15.3090756455881  loss: 0.0060 (0.0096)  time: 0.3059  data: 0.0038  max mem: 5081\n",
      "2021/06/02 05:35:18\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:56:24\n",
      "2021/06/02 05:36:23\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
      "2021/06/02 05:36:23\tINFO\t__main__\tValidation: accuracy = 0.907642839475637, f1 = 0.8791507540941161\n",
      "2021/06/02 05:36:23\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 05:36:25\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [    0/11371]  eta: 1:02:15  lr: 3.333040189956908e-05  sample/s: 13.86106888392444  loss: 0.0144 (0.0144)  time: 0.3285  data: 0.0399  max mem: 5081\n",
      "2021/06/02 05:41:22\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 1000/11371]  eta: 0:51:19  lr: 3.039896813531498e-05  sample/s: 16.50386252499845  loss: 0.0027 (0.0042)  time: 0.3085  data: 0.0038  max mem: 5081\n",
      "2021/06/02 05:46:19\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 2000/11371]  eta: 0:46:24  lr: 2.746753437106089e-05  sample/s: 9.212950571841363  loss: 0.0038 (0.0041)  time: 0.3131  data: 0.0037  max mem: 5081\n",
      "2021/06/02 05:51:19\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 3000/11371]  eta: 0:41:33  lr: 2.453610060680679e-05  sample/s: 16.449766939306173  loss: 0.0032 (0.0041)  time: 0.3131  data: 0.0037  max mem: 5081\n",
      "2021/06/02 05:56:14\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 4000/11371]  eta: 0:36:31  lr: 2.1604666842552692e-05  sample/s: 13.479280803620567  loss: 0.0026 (0.0040)  time: 0.2889  data: 0.0037  max mem: 5081\n",
      "2021/06/02 06:01:12\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 5000/11371]  eta: 0:31:34  lr: 1.8673233078298597e-05  sample/s: 12.44085608479576  loss: 0.0033 (0.0040)  time: 0.3045  data: 0.0038  max mem: 5081\n",
      "2021/06/02 06:06:09\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 6000/11371]  eta: 0:26:36  lr: 1.5741799314044502e-05  sample/s: 12.830109724053335  loss: 0.0037 (0.0040)  time: 0.2957  data: 0.0038  max mem: 5081\n",
      "2021/06/02 06:11:07\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 7000/11371]  eta: 0:21:39  lr: 1.2810365549790403e-05  sample/s: 15.13350544012614  loss: 0.0029 (0.0040)  time: 0.2724  data: 0.0036  max mem: 5081\n",
      "2021/06/02 06:16:03\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 8000/11371]  eta: 0:16:42  lr: 9.878931785536307e-06  sample/s: 14.80553349359144  loss: 0.0034 (0.0039)  time: 0.2689  data: 0.0037  max mem: 5081\n",
      "2021/06/02 06:21:01\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 9000/11371]  eta: 0:11:45  lr: 6.947498021282209e-06  sample/s: 13.400942375172033  loss: 0.0025 (0.0039)  time: 0.2937  data: 0.0036  max mem: 5081\n",
      "2021/06/02 06:25:55\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [10000/11371]  eta: 0:06:47  lr: 4.016064257028113e-06  sample/s: 16.58661915924116  loss: 0.0024 (0.0039)  time: 0.2769  data: 0.0037  max mem: 5081\n",
      "2021/06/02 06:30:51\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [11000/11371]  eta: 0:01:50  lr: 1.084630492774016e-06  sample/s: 18.37272150852152  loss: 0.0041 (0.0038)  time: 0.2898  data: 0.0038  max mem: 5081\n",
      "2021/06/02 06:32:41\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:56:16\n",
      "2021/06/02 06:33:46\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
      "2021/06/02 06:33:46\tINFO\t__main__\tValidation: accuracy = 0.9122433836260203, f1 = 0.881906537078951\n",
      "2021/06/02 06:33:46\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "tokenizer config file saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/02 06:33:48\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/02 06:36:18\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
      "2021/06/02 06:36:18\tINFO\t__main__\tTest: accuracy = 0.9108088053425674, f1 = 0.8808406582512723\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qqp\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/qqp/kd/qqp-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 06:36:21\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/02 06:37:27\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
      "2021/06/02 06:37:27\tINFO\t__main__\tTest: accuracy = 0.9122433836260203, f1 = 0.881906537078951\n",
      "2021/06/02 06:37:27\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/02 06:37:27\tINFO\t__main__\tqqp/test: 390965 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task qqp \\\n",
    "  --run_log log/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nGATNCSI1_vr"
   },
   "source": [
    "### 4.6 MNLI task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "RMnPiXycp8-B",
    "outputId": "22fce1c1-9c17-4338-cf99-a150f54b25ef"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-02 15:41:39.173367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/02 15:41:41\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)\n",
      "2021/06/02 15:41:41\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/02 15:41:41\tINFO\tfilelock\tLock 140424696020240 acquired on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprpy1e3pe\n",
      "Downloading: 100% 853/853 [00:00<00:00, 736kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09\n",
      "creating metadata file for /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09\n",
      "2021/06/02 15:41:42\tINFO\tfilelock\tLock 140424696020240 released on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\",\n",
      "    \"1\": \"LABEL_1\",\n",
      "    \"2\": \"LABEL_2\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0,\n",
      "    \"LABEL_1\": 1,\n",
      "    \"LABEL_2\": 2\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\",\n",
      "    \"1\": \"LABEL_1\",\n",
      "    \"2\": \"LABEL_2\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0,\n",
      "    \"LABEL_1\": 1,\n",
      "    \"LABEL_2\": 2\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 15:41:42\tINFO\tfilelock\tLock 140424664027984 acquired on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpunwstym6\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.71MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 15:41:43\tINFO\tfilelock\tLock 140424664027984 released on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 15:41:43\tINFO\tfilelock\tLock 140424663639824 acquired on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpnpu11blv\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 3.34MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "creating metadata file for /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "2021/06/02 15:41:44\tINFO\tfilelock\tLock 140424663639824 released on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "2021/06/02 15:41:44\tINFO\tfilelock\tLock 140424663640464 acquired on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpdh4u9sh3\n",
      "Downloading: 100% 112/112 [00:00<00:00, 124kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/02 15:41:44\tINFO\tfilelock\tLock 140424663640464 released on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/02 15:41:45\tINFO\tfilelock\tLock 140424663709520 acquired on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpbpn2yxu3\n",
      "Downloading: 100% 304/304 [00:00<00:00, 306kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 15:41:45\tINFO\tfilelock\tLock 140424663709520 released on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/02 15:41:45\tINFO\tfilelock\tLock 140424664029776 acquired on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpv1bz0o4i\n",
      "Downloading: 100% 1.34G/1.34G [00:22<00:00, 59.3MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa\n",
      "creating metadata file for /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa\n",
      "2021/06/02 15:42:08\tINFO\tfilelock\tLock 140424664029776 released on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-mnli.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 15:42:12\tINFO\tfilelock\tLock 140424513757776 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpl3b1lpde\n",
      "Downloading: 100% 570/570 [00:00<00:00, 478kB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "creating metadata file for /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "2021/06/02 15:42:12\tINFO\tfilelock\tLock 140424513757776 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\",\n",
      "    \"1\": \"LABEL_1\",\n",
      "    \"2\": \"LABEL_2\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0,\n",
      "    \"LABEL_1\": 1,\n",
      "    \"LABEL_2\": 2\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/02 15:42:13\tINFO\tfilelock\tLock 140424513761040 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp2tfzhhav\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 1.75MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/02 15:42:14\tINFO\tfilelock\tLock 140424513761040 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/02 15:42:14\tINFO\tfilelock\tLock 140424513808848 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp987s0v4b\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 2.61MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "creating metadata file for /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "2021/06/02 15:42:14\tINFO\tfilelock\tLock 140424513808848 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
      "2021/06/02 15:42:15\tINFO\tfilelock\tLock 140424513832720 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpprzngz2j\n",
      "Downloading: 100% 28.0/28.0 [00:00<00:00, 24.0kB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "creating metadata file for /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "2021/06/02 15:42:15\tINFO\tfilelock\tLock 140424513832720 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "2021/06/02 15:42:16\tINFO\tfilelock\tLock 140424513757904 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmphqvienp4\n",
      "Downloading: 100% 440M/440M [00:07<00:00, 59.6MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "creating metadata file for /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "2021/06/02 15:42:23\tINFO\tfilelock\tLock 140424513757904 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading: 28.8kB [00:00, 22.2MB/s]       \n",
      "Downloading: 28.7kB [00:00, 26.8MB/s]       \n",
      "Downloading and preparing dataset glue/mnli (download: 298.29 MiB, generated: 78.65 MiB, post-processed: Unknown size, total: 376.95 MiB) to /root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 313M/313M [00:05<00:00, 59.1MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 393/393 [00:38<00:00, 10.33ba/s]\n",
      "100% 10/10 [00:00<00:00, 11.80ba/s]\n",
      "100% 10/10 [00:01<00:00,  9.95ba/s]\n",
      "100% 10/10 [00:00<00:00, 11.61ba/s]\n",
      "100% 10/10 [00:00<00:00, 10.01ba/s]\n",
      "Downloading and preparing dataset glue/ax (download: 217.05 KiB, generated: 232.80 KiB, post-processed: Unknown size, total: 449.85 KiB) to /root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 222kB [00:00, 2.57MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 2/2 [00:00<00:00, 18.07ba/s]\n",
      "Downloading: 5.75kB [00:00, 5.74MB/s]       \n",
      "2021/06/02 15:43:52\tINFO\t__main__\tStart training\n",
      "2021/06/02 15:43:52\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/02 15:43:52\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/02 15:43:52\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/02 15:43:52\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/02 15:43:52\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/02 15:43:52\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/02 15:44:03\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [    0/12272]  eta: 2:00:46  lr: 9.999728378965668e-05  sample/s: 7.339660429690747  loss: 0.1142 (0.1142)  time: 0.5905  data: 0.0455  max mem: 3246\n",
      "2021/06/02 15:49:58\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 1000/12272]  eta: 1:06:40  lr: 9.728107344632768e-05  sample/s: 9.187960569550931  loss: 0.0186 (0.0388)  time: 0.3554  data: 0.0045  max mem: 5103\n",
      "2021/06/02 15:55:59\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 2000/12272]  eta: 1:01:14  lr: 9.45648631029987e-05  sample/s: 12.890369879780044  loss: 0.0158 (0.0296)  time: 0.3693  data: 0.0045  max mem: 5103\n",
      "2021/06/02 16:01:58\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 3000/12272]  eta: 0:55:21  lr: 9.184865275966971e-05  sample/s: 9.630886406715874  loss: 0.0121 (0.0251)  time: 0.3599  data: 0.0043  max mem: 5103\n",
      "2021/06/02 16:07:55\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 4000/12272]  eta: 0:49:21  lr: 8.913244241634072e-05  sample/s: 15.33063281793269  loss: 0.0124 (0.0224)  time: 0.3630  data: 0.0043  max mem: 5103\n",
      "2021/06/02 16:13:54\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 5000/12272]  eta: 0:43:23  lr: 8.641623207301173e-05  sample/s: 10.334762658335062  loss: 0.0117 (0.0206)  time: 0.3547  data: 0.0044  max mem: 5103\n",
      "2021/06/02 16:19:51\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 6000/12272]  eta: 0:37:25  lr: 8.370002172968275e-05  sample/s: 9.739552045587274  loss: 0.0109 (0.0192)  time: 0.3658  data: 0.0043  max mem: 5103\n",
      "2021/06/02 16:25:49\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 7000/12272]  eta: 0:31:27  lr: 8.098381138635376e-05  sample/s: 13.036011157817853  loss: 0.0110 (0.0182)  time: 0.3582  data: 0.0046  max mem: 5103\n",
      "2021/06/02 16:31:45\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 8000/12272]  eta: 0:25:28  lr: 7.826760104302477e-05  sample/s: 8.596378161986124  loss: 0.0079 (0.0173)  time: 0.3617  data: 0.0045  max mem: 5103\n",
      "2021/06/02 16:37:45\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 9000/12272]  eta: 0:19:31  lr: 7.555139069969579e-05  sample/s: 11.641256576181007  loss: 0.0085 (0.0165)  time: 0.3513  data: 0.0043  max mem: 5103\n",
      "2021/06/02 16:43:41\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [10000/12272]  eta: 0:13:32  lr: 7.283518035636681e-05  sample/s: 13.309472792367123  loss: 0.0113 (0.0159)  time: 0.3421  data: 0.0042  max mem: 5103\n",
      "2021/06/02 16:49:36\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [11000/12272]  eta: 0:07:34  lr: 7.011897001303781e-05  sample/s: 14.014636782747942  loss: 0.0078 (0.0153)  time: 0.3678  data: 0.0043  max mem: 5103\n",
      "2021/06/02 16:55:36\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [12000/12272]  eta: 0:01:37  lr: 6.740275966970883e-05  sample/s: 13.807340008262734  loss: 0.0091 (0.0149)  time: 0.3708  data: 0.0045  max mem: 5103\n",
      "2021/06/02 16:57:14\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 1:13:11\n",
      "2021/06/02 16:57:34\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
      "2021/06/02 16:57:34\tINFO\t__main__\tValidation: accuracy = 0.8426897605705552\n",
      "2021/06/02 16:57:34\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 16:57:36\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [    0/12272]  eta: 1:26:11  lr: 6.666395045632334e-05  sample/s: 10.095232249931554  loss: 0.0039 (0.0039)  time: 0.4214  data: 0.0252  max mem: 5103\n",
      "2021/06/02 17:03:32\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 1000/12272]  eta: 1:06:52  lr: 6.394774011299436e-05  sample/s: 8.477443126522832  loss: 0.0050 (0.0053)  time: 0.3562  data: 0.0042  max mem: 5103\n",
      "2021/06/02 17:09:30\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 2000/12272]  eta: 1:01:07  lr: 6.123152976966536e-05  sample/s: 11.18710029319334  loss: 0.0042 (0.0053)  time: 0.3509  data: 0.0042  max mem: 5103\n",
      "2021/06/02 17:15:28\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 3000/12272]  eta: 0:55:13  lr: 5.851531942633638e-05  sample/s: 8.516673215136  loss: 0.0050 (0.0052)  time: 0.3486  data: 0.0042  max mem: 5103\n",
      "2021/06/02 17:21:25\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 4000/12272]  eta: 0:49:14  lr: 5.5799109083007396e-05  sample/s: 12.840095911074851  loss: 0.0044 (0.0052)  time: 0.3629  data: 0.0044  max mem: 5103\n",
      "2021/06/02 17:27:21\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 5000/12272]  eta: 0:43:16  lr: 5.30828987396784e-05  sample/s: 9.236693485784253  loss: 0.0048 (0.0051)  time: 0.3397  data: 0.0042  max mem: 5103\n",
      "2021/06/02 17:33:19\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 6000/12272]  eta: 0:37:19  lr: 5.036668839634942e-05  sample/s: 15.518423554919176  loss: 0.0041 (0.0051)  time: 0.3605  data: 0.0042  max mem: 5103\n",
      "2021/06/02 17:39:18\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 7000/12272]  eta: 0:31:24  lr: 4.765047805302043e-05  sample/s: 11.464584749845393  loss: 0.0041 (0.0050)  time: 0.3474  data: 0.0043  max mem: 5103\n",
      "2021/06/02 17:45:18\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 8000/12272]  eta: 0:25:28  lr: 4.493426770969144e-05  sample/s: 9.60852563118673  loss: 0.0040 (0.0050)  time: 0.3698  data: 0.0044  max mem: 5103\n",
      "2021/06/02 17:51:15\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 9000/12272]  eta: 0:19:30  lr: 4.221805736636245e-05  sample/s: 8.492612782440153  loss: 0.0041 (0.0050)  time: 0.3591  data: 0.0042  max mem: 5103\n",
      "2021/06/02 17:57:13\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [10000/12272]  eta: 0:13:32  lr: 3.9501847023033466e-05  sample/s: 11.7108953131893  loss: 0.0043 (0.0049)  time: 0.3385  data: 0.0043  max mem: 5103\n",
      "2021/06/02 18:03:15\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [11000/12272]  eta: 0:07:35  lr: 3.6785636679704476e-05  sample/s: 9.006469844255625  loss: 0.0034 (0.0049)  time: 0.3743  data: 0.0042  max mem: 5103\n",
      "2021/06/02 18:09:09\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [12000/12272]  eta: 0:01:37  lr: 3.406942633637549e-05  sample/s: 9.670829425643017  loss: 0.0037 (0.0048)  time: 0.3524  data: 0.0042  max mem: 5103\n",
      "2021/06/02 18:10:45\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 1:13:09\n",
      "2021/06/02 18:11:05\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
      "2021/06/02 18:11:05\tINFO\t__main__\tValidation: accuracy = 0.8508405501782985\n",
      "2021/06/02 18:11:05\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/02 18:11:07\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [    0/12272]  eta: 1:01:16  lr: 3.3330617122990006e-05  sample/s: 14.562417866659723  loss: 0.0018 (0.0018)  time: 0.2995  data: 0.0249  max mem: 5103\n",
      "2021/06/02 18:17:09\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 1000/12272]  eta: 1:07:57  lr: 3.061440677966102e-05  sample/s: 12.871550074648965  loss: 0.0021 (0.0024)  time: 0.3569  data: 0.0042  max mem: 5103\n",
      "2021/06/02 18:23:06\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 2000/12272]  eta: 1:01:33  lr: 2.789819643633203e-05  sample/s: 9.113609289384469  loss: 0.0020 (0.0023)  time: 0.3516  data: 0.0042  max mem: 5103\n",
      "2021/06/02 18:29:05\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 3000/12272]  eta: 0:55:31  lr: 2.5181986093003048e-05  sample/s: 13.344640776312911  loss: 0.0020 (0.0023)  time: 0.3721  data: 0.0043  max mem: 5103\n",
      "2021/06/02 18:35:04\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 4000/12272]  eta: 0:49:31  lr: 2.2465775749674055e-05  sample/s: 14.003652571657513  loss: 0.0018 (0.0023)  time: 0.3474  data: 0.0042  max mem: 5103\n",
      "2021/06/02 18:40:59\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 5000/12272]  eta: 0:43:26  lr: 1.974956540634507e-05  sample/s: 11.539520129418142  loss: 0.0019 (0.0023)  time: 0.3472  data: 0.0043  max mem: 5103\n",
      "2021/06/02 18:46:55\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 6000/12272]  eta: 0:37:25  lr: 1.7033355063016082e-05  sample/s: 10.809866832299416  loss: 0.0022 (0.0023)  time: 0.3541  data: 0.0043  max mem: 5103\n",
      "2021/06/02 18:52:51\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 7000/12272]  eta: 0:31:25  lr: 1.4317144719687093e-05  sample/s: 11.435117771267873  loss: 0.0021 (0.0023)  time: 0.3276  data: 0.0042  max mem: 5103\n",
      "2021/06/02 18:58:50\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 8000/12272]  eta: 0:25:28  lr: 1.1600934376358105e-05  sample/s: 11.54257667833729  loss: 0.0017 (0.0023)  time: 0.3565  data: 0.0043  max mem: 5103\n",
      "2021/06/02 19:04:48\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 9000/12272]  eta: 0:19:31  lr: 8.884724033029119e-06  sample/s: 13.949522286770948  loss: 0.0020 (0.0023)  time: 0.3393  data: 0.0042  max mem: 5103\n",
      "2021/06/02 19:10:45\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [10000/12272]  eta: 0:13:32  lr: 6.168513689700131e-06  sample/s: 10.394650884993036  loss: 0.0018 (0.0022)  time: 0.3581  data: 0.0042  max mem: 5103\n",
      "2021/06/02 19:16:40\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [11000/12272]  eta: 0:07:34  lr: 3.452303346371143e-06  sample/s: 10.786792409158593  loss: 0.0018 (0.0022)  time: 0.3518  data: 0.0043  max mem: 5103\n",
      "2021/06/02 19:22:38\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [12000/12272]  eta: 0:01:37  lr: 7.360930030421556e-07  sample/s: 12.892866172434966  loss: 0.0019 (0.0022)  time: 0.3638  data: 0.0042  max mem: 5103\n",
      "2021/06/02 19:24:15\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 1:13:08\n",
      "2021/06/02 19:24:35\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
      "2021/06/02 19:24:35\tINFO\t__main__\tValidation: accuracy = 0.8527763627101376\n",
      "2021/06/02 19:24:35\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "tokenizer config file saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/02 19:24:36\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/02 19:25:21\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
      "2021/06/02 19:25:21\tINFO\t__main__\tTest: accuracy = 0.8665308201732043\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"mnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"id2label\": {\n",
      "    \"0\": \"LABEL_0\",\n",
      "    \"1\": \"LABEL_1\",\n",
      "    \"2\": \"LABEL_2\"\n",
      "  },\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"label2id\": {\n",
      "    \"LABEL_0\": 0,\n",
      "    \"LABEL_1\": 1,\n",
      "    \"LABEL_2\": 2\n",
      "  },\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/02 19:25:25\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/02 19:25:45\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
      "2021/06/02 19:25:45\tINFO\t__main__\tTest: accuracy = 0.8527763627101376\n",
      "2021/06/02 19:25:45\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/02 19:25:45\tINFO\t__main__\tmnli/test_m: 9796 samples\n",
      "2021/06/02 19:26:05\tINFO\t__main__\tmnli/test_mm: 9847 samples\n",
      "2021/06/02 19:26:25\tINFO\t__main__\tax/test_ax: 1104 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task mnli \\\n",
    "  --run_log log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dNMwSfQx2DN_"
   },
   "source": [
    "### 4.7 QNLI task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "tm6AEL9cqAnd",
    "outputId": "9ed726a0-7573-4dca-80b2-3139f6b8dd82"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-03 03:06:20.905872: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/03 03:06:23\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)\n",
      "2021/06/03 03:06:23\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398783211984 acquired on /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpnz95qlkq\n",
      "Downloading: 100% 699/699 [00:00<00:00, 1.05MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2\n",
      "creating metadata file for /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398783211984 released on /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398786847952 acquired on /root/.cache/huggingface/transformers/a89adc171199945c88c6380d6e359e2f2f1d9e33fbf2914840ccbcae2a88cc90.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpcon1w8sa\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 35.0MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/a89adc171199945c88c6380d6e359e2f2f1d9e33fbf2914840ccbcae2a88cc90.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/a89adc171199945c88c6380d6e359e2f2f1d9e33fbf2914840ccbcae2a88cc90.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398786847952 released on /root/.cache/huggingface/transformers/a89adc171199945c88c6380d6e359e2f2f1d9e33fbf2914840ccbcae2a88cc90.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398780472336 acquired on /root/.cache/huggingface/transformers/e40cf262b8ee3363b76e16e6b8d7a7cff091ce90d0df585990d20879a18b62d6.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpcv49zr2d\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 39.7MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/e40cf262b8ee3363b76e16e6b8d7a7cff091ce90d0df585990d20879a18b62d6.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "creating metadata file for /root/.cache/huggingface/transformers/e40cf262b8ee3363b76e16e6b8d7a7cff091ce90d0df585990d20879a18b62d6.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398780472336 released on /root/.cache/huggingface/transformers/e40cf262b8ee3363b76e16e6b8d7a7cff091ce90d0df585990d20879a18b62d6.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398749025360 acquired on /root/.cache/huggingface/transformers/8af4ffda1cdd3bddb5fdcc1e67501f432ccc42db734ab6108c0013a2f653330e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpcjt7h0yw\n",
      "Downloading: 100% 112/112 [00:00<00:00, 157kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/8af4ffda1cdd3bddb5fdcc1e67501f432ccc42db734ab6108c0013a2f653330e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/8af4ffda1cdd3bddb5fdcc1e67501f432ccc42db734ab6108c0013a2f653330e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398749025360 released on /root/.cache/huggingface/transformers/8af4ffda1cdd3bddb5fdcc1e67501f432ccc42db734ab6108c0013a2f653330e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398748590864 acquired on /root/.cache/huggingface/transformers/ceadc7427e9956e9cfd059a42937d2e2622d3da1f2c317b34b12dfc4a7d55d3b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpyk9oqe4f\n",
      "Downloading: 100% 304/304 [00:00<00:00, 514kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/ceadc7427e9956e9cfd059a42937d2e2622d3da1f2c317b34b12dfc4a7d55d3b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/ceadc7427e9956e9cfd059a42937d2e2622d3da1f2c317b34b12dfc4a7d55d3b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398748590864 released on /root/.cache/huggingface/transformers/ceadc7427e9956e9cfd059a42937d2e2622d3da1f2c317b34b12dfc4a7d55d3b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/a89adc171199945c88c6380d6e359e2f2f1d9e33fbf2914840ccbcae2a88cc90.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/e40cf262b8ee3363b76e16e6b8d7a7cff091ce90d0df585990d20879a18b62d6.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/8af4ffda1cdd3bddb5fdcc1e67501f432ccc42db734ab6108c0013a2f653330e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/ceadc7427e9956e9cfd059a42937d2e2622d3da1f2c317b34b12dfc4a7d55d3b.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/03 03:06:23\tINFO\tfilelock\tLock 140398748590864 acquired on /root/.cache/huggingface/transformers/e9885efd6faed99edb75942a797fb277852d1c5e496344391594af18bdb6aa75.142b7305af89599e15e0af127ab907dbd903a36257191ae84062060491593a1e.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpfwjujgyo\n",
      "Downloading: 100% 1.34G/1.34G [00:27<00:00, 48.7MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/e9885efd6faed99edb75942a797fb277852d1c5e496344391594af18bdb6aa75.142b7305af89599e15e0af127ab907dbd903a36257191ae84062060491593a1e\n",
      "creating metadata file for /root/.cache/huggingface/transformers/e9885efd6faed99edb75942a797fb277852d1c5e496344391594af18bdb6aa75.142b7305af89599e15e0af127ab907dbd903a36257191ae84062060491593a1e\n",
      "2021/06/03 03:06:51\tINFO\tfilelock\tLock 140398748590864 released on /root/.cache/huggingface/transformers/e9885efd6faed99edb75942a797fb277852d1c5e496344391594af18bdb6aa75.142b7305af89599e15e0af127ab907dbd903a36257191ae84062060491593a1e.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/e9885efd6faed99edb75942a797fb277852d1c5e496344391594af18bdb6aa75.142b7305af89599e15e0af127ab907dbd903a36257191ae84062060491593a1e\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-qnli.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398599221584 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpdef0muha\n",
      "Downloading: 100% 570/570 [00:00<00:00, 615kB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "creating metadata file for /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398599221584 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598749328 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp27bv8_dt\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 48.0MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598749328 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598722896 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp56r2kfwu\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 36.8MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "creating metadata file for /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598722896 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598765392 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpy53r48r6\n",
      "Downloading: 100% 28.0/28.0 [00:00<00:00, 42.8kB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "creating metadata file for /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598765392 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "2021/06/03 03:06:55\tINFO\tfilelock\tLock 140398598794768 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
      "https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpcuv_0i2t\n",
      "Downloading: 100% 440M/440M [00:07<00:00, 58.6MB/s]\n",
      "storing https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "creating metadata file for /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "2021/06/03 03:07:03\tINFO\tfilelock\tLock 140398598794768 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading: 28.8kB [00:00, 24.9MB/s]       \n",
      "Downloading: 28.7kB [00:00, 32.1MB/s]       \n",
      "Downloading and preparing dataset glue/qnli (download: 10.14 MiB, generated: 27.11 MiB, post-processed: Unknown size, total: 37.24 MiB) to /root/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 10.6M/10.6M [00:00<00:00, 10.9MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 105/105 [00:12<00:00,  8.56ba/s]\n",
      "100% 6/6 [00:00<00:00,  9.78ba/s]\n",
      "100% 6/6 [00:00<00:00,  9.71ba/s]\n",
      "Downloading: 5.75kB [00:00, 6.63MB/s]       \n",
      "2021/06/03 03:07:24\tINFO\t__main__\tStart training\n",
      "2021/06/03 03:07:24\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/03 03:07:24\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/03 03:07:24\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/03 03:07:24\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/03 03:07:24\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/03 03:07:24\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/03 03:07:36\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [   0/3274]  eta: 0:22:51  lr: 4.999490938709021e-05  sample/s: 10.175484052528281  loss: 0.7095 (0.7095)  time: 0.4188  data: 0.0257  max mem: 2695\n",
      "2021/06/03 03:10:51\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 500/3274]  eta: 0:17:58  lr: 4.7449602932193036e-05  sample/s: 8.024768986534996  loss: 0.3140 (0.4333)  time: 0.4150  data: 0.0045  max mem: 5091\n",
      "2021/06/03 03:14:21\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1000/3274]  eta: 0:15:21  lr: 4.490429647729587e-05  sample/s: 11.951150293200818  loss: 0.2883 (0.3825)  time: 0.3943  data: 0.0044  max mem: 5091\n",
      "2021/06/03 03:17:49\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1500/3274]  eta: 0:12:04  lr: 4.23589900223987e-05  sample/s: 8.344598587051888  loss: 0.2849 (0.3574)  time: 0.4431  data: 0.0047  max mem: 5091\n",
      "2021/06/03 03:21:17\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [2000/3274]  eta: 0:08:42  lr: 3.981368356750153e-05  sample/s: 10.63528546660708  loss: 0.2698 (0.3399)  time: 0.4071  data: 0.0046  max mem: 5091\n",
      "2021/06/03 03:24:45\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [2500/3274]  eta: 0:05:18  lr: 3.726837711260436e-05  sample/s: 7.687123397887477  loss: 0.3215 (0.3262)  time: 0.4205  data: 0.0046  max mem: 5091\n",
      "2021/06/03 03:28:10\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [3000/3274]  eta: 0:01:52  lr: 3.472307065770719e-05  sample/s: 9.971611258973265  loss: 0.2358 (0.3142)  time: 0.3955  data: 0.0046  max mem: 5091\n",
      "2021/06/03 03:30:04\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:22:27\n",
      "2021/06/03 03:30:18\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
      "2021/06/03 03:30:18\tINFO\t__main__\tValidation: accuracy = 0.9097565440234303\n",
      "2021/06/03 03:30:18\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/03 03:30:19\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [   0/3274]  eta: 0:27:18  lr: 3.332824272042354e-05  sample/s: 8.17277721297125  loss: 0.1565 (0.1565)  time: 0.5005  data: 0.0111  max mem: 5091\n",
      "2021/06/03 03:33:48\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 500/3274]  eta: 0:19:17  lr: 3.078293626552637e-05  sample/s: 7.788207310245838  loss: 0.1492 (0.1423)  time: 0.4308  data: 0.0044  max mem: 5091\n",
      "2021/06/03 03:37:16\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1000/3274]  eta: 0:15:47  lr: 2.82376298106292e-05  sample/s: 10.279369533583518  loss: 0.0876 (0.1420)  time: 0.4305  data: 0.0045  max mem: 5091\n",
      "2021/06/03 03:40:40\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1500/3274]  eta: 0:12:13  lr: 2.569232335573203e-05  sample/s: 13.161573751561138  loss: 0.0915 (0.1424)  time: 0.3989  data: 0.0044  max mem: 5091\n",
      "2021/06/03 03:44:10\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [2000/3274]  eta: 0:08:49  lr: 2.314701690083486e-05  sample/s: 8.976040481276929  loss: 0.1810 (0.1432)  time: 0.4275  data: 0.0046  max mem: 5091\n",
      "2021/06/03 03:47:40\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [2500/3274]  eta: 0:05:22  lr: 2.060171044593769e-05  sample/s: 13.07498242611565  loss: 0.1526 (0.1436)  time: 0.4021  data: 0.0046  max mem: 5091\n",
      "2021/06/03 03:51:07\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [3000/3274]  eta: 0:01:53  lr: 1.8056403991040522e-05  sample/s: 11.58728336714094  loss: 0.1307 (0.1443)  time: 0.4297  data: 0.0048  max mem: 5091\n",
      "2021/06/03 03:53:00\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:22:40\n",
      "2021/06/03 03:53:14\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
      "2021/06/03 03:53:14\tINFO\t__main__\tValidation: accuracy = 0.9114039904814205\n",
      "2021/06/03 03:53:14\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/03 03:53:15\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [   0/3274]  eta: 0:19:48  lr: 1.6661576053756872e-05  sample/s: 11.305173076015983  loss: 0.0112 (0.0112)  time: 0.3631  data: 0.0093  max mem: 5091\n",
      "2021/06/03 03:56:41\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 500/3274]  eta: 0:19:04  lr: 1.4116269598859703e-05  sample/s: 7.834760916081297  loss: 0.0002 (0.0966)  time: 0.3978  data: 0.0045  max mem: 5091\n",
      "2021/06/03 04:00:10\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1000/3274]  eta: 0:15:43  lr: 1.1570963143962533e-05  sample/s: 10.657160443558812  loss: 0.0247 (0.1165)  time: 0.4150  data: 0.0046  max mem: 5091\n",
      "2021/06/03 04:03:37\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1500/3274]  eta: 0:12:15  lr: 9.025656689065364e-06  sample/s: 12.105839042109046  loss: 0.0809 (0.1307)  time: 0.4100  data: 0.0045  max mem: 5091\n",
      "2021/06/03 04:07:03\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [2000/3274]  eta: 0:08:47  lr: 6.480350234168193e-06  sample/s: 12.197442328803973  loss: 0.1899 (0.1330)  time: 0.4069  data: 0.0045  max mem: 5091\n",
      "2021/06/03 04:10:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [2500/3274]  eta: 0:05:19  lr: 3.935043779271024e-06  sample/s: 14.317999380417612  loss: 0.0008 (0.1358)  time: 0.3931  data: 0.0043  max mem: 5091\n",
      "2021/06/03 04:13:55\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [3000/3274]  eta: 0:01:53  lr: 1.3897373243738546e-06  sample/s: 8.319782598454287  loss: 0.0000 (0.1366)  time: 0.4256  data: 0.0046  max mem: 5091\n",
      "2021/06/03 04:15:46\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:22:31\n",
      "2021/06/03 04:16:00\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:16:00\tINFO\t__main__\tValidation: accuracy = 0.9079260479589969\n",
      "tokenizer config file saved in ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/03 04:16:00\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/03 04:16:33\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:16:33\tINFO\t__main__\tTest: accuracy = 0.9225700164744646\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"qnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/qnli/kd/qnli-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/03 04:16:34\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/03 04:16:48\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:16:48\tINFO\t__main__\tTest: accuracy = 0.9114039904814205\n",
      "2021/06/03 04:16:48\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/03 04:16:48\tINFO\t__main__\tqnli/test: 5463 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task qnli \\\n",
    "  --run_log log/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b99U7uAX2HI5"
   },
   "source": [
    "### 4.8 RTE task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "m-iYN-RSqEwF",
    "outputId": "b0a87128-d319-4b5f-c81f-81889dfb57e5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-03 04:17:06.686595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/03 04:17:08\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)\n",
      "2021/06/03 04:17:08\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704299298960 acquired on /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpc14qb0k3\n",
      "Downloading: 100% 698/698 [00:00<00:00, 912kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add\n",
      "creating metadata file for /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704299298960 released on /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"rte\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"rte\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704349067344 acquired on /root/.cache/huggingface/transformers/9e48df68eb6ea8a1e3c32b8a0ab33f4892b1fcf0bfd43dea0382cef78c6b8ce6.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp3pd9l02w\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 27.3MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/9e48df68eb6ea8a1e3c32b8a0ab33f4892b1fcf0bfd43dea0382cef78c6b8ce6.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/9e48df68eb6ea8a1e3c32b8a0ab33f4892b1fcf0bfd43dea0382cef78c6b8ce6.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704349067344 released on /root/.cache/huggingface/transformers/9e48df68eb6ea8a1e3c32b8a0ab33f4892b1fcf0bfd43dea0382cef78c6b8ce6.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704291208144 acquired on /root/.cache/huggingface/transformers/9dff6f7ae0a43b39f4abbfa326e2075ad4c830d78b12831e7ac8a8017257917e.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpa8fq2c3o\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 35.0MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/9dff6f7ae0a43b39f4abbfa326e2075ad4c830d78b12831e7ac8a8017257917e.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "creating metadata file for /root/.cache/huggingface/transformers/9dff6f7ae0a43b39f4abbfa326e2075ad4c830d78b12831e7ac8a8017257917e.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704291208144 released on /root/.cache/huggingface/transformers/9dff6f7ae0a43b39f4abbfa326e2075ad4c830d78b12831e7ac8a8017257917e.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704297609104 acquired on /root/.cache/huggingface/transformers/0424525f36c8313755652d4e1279490572c80722a1467de4bf5ec46eaad9db68.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpeqy7mgz2\n",
      "Downloading: 100% 112/112 [00:00<00:00, 173kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/0424525f36c8313755652d4e1279490572c80722a1467de4bf5ec46eaad9db68.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/0424525f36c8313755652d4e1279490572c80722a1467de4bf5ec46eaad9db68.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/03 04:17:08\tINFO\tfilelock\tLock 139704297609104 released on /root/.cache/huggingface/transformers/0424525f36c8313755652d4e1279490572c80722a1467de4bf5ec46eaad9db68.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/03 04:17:09\tINFO\tfilelock\tLock 139704259761936 acquired on /root/.cache/huggingface/transformers/bd9d5bd9b905abcf94dad4fe512dd98c3f368e382e48b60a0f2e018675f42e3f.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprjs7n32q\n",
      "Downloading: 100% 304/304 [00:00<00:00, 472kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/bd9d5bd9b905abcf94dad4fe512dd98c3f368e382e48b60a0f2e018675f42e3f.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/bd9d5bd9b905abcf94dad4fe512dd98c3f368e382e48b60a0f2e018675f42e3f.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/03 04:17:09\tINFO\tfilelock\tLock 139704259761936 released on /root/.cache/huggingface/transformers/bd9d5bd9b905abcf94dad4fe512dd98c3f368e382e48b60a0f2e018675f42e3f.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/9e48df68eb6ea8a1e3c32b8a0ab33f4892b1fcf0bfd43dea0382cef78c6b8ce6.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/9dff6f7ae0a43b39f4abbfa326e2075ad4c830d78b12831e7ac8a8017257917e.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/0424525f36c8313755652d4e1279490572c80722a1467de4bf5ec46eaad9db68.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/bd9d5bd9b905abcf94dad4fe512dd98c3f368e382e48b60a0f2e018675f42e3f.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/03 04:17:09\tINFO\tfilelock\tLock 139704259761680 acquired on /root/.cache/huggingface/transformers/84194db158b82ea28de17e53a78511304057104b54ac5e4e97ad2b2d13e2ec01.9b6d281f5b165990ed33c0343f4207cd1d5cbeabac6adc7aa15b2ae2cd9a720f.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpyw956q67\n",
      "Downloading: 100% 1.34G/1.34G [00:23<00:00, 56.4MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/84194db158b82ea28de17e53a78511304057104b54ac5e4e97ad2b2d13e2ec01.9b6d281f5b165990ed33c0343f4207cd1d5cbeabac6adc7aa15b2ae2cd9a720f\n",
      "creating metadata file for /root/.cache/huggingface/transformers/84194db158b82ea28de17e53a78511304057104b54ac5e4e97ad2b2d13e2ec01.9b6d281f5b165990ed33c0343f4207cd1d5cbeabac6adc7aa15b2ae2cd9a720f\n",
      "2021/06/03 04:17:33\tINFO\tfilelock\tLock 139704259761680 released on /root/.cache/huggingface/transformers/84194db158b82ea28de17e53a78511304057104b54ac5e4e97ad2b2d13e2ec01.9b6d281f5b165990ed33c0343f4207cd1d5cbeabac6adc7aa15b2ae2cd9a720f.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/84194db158b82ea28de17e53a78511304057104b54ac5e4e97ad2b2d13e2ec01.9b6d281f5b165990ed33c0343f4207cd1d5cbeabac6adc7aa15b2ae2cd9a720f\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-rte.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"rte\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading and preparing dataset glue/rte (download: 680.81 KiB, generated: 1.83 MiB, post-processed: Unknown size, total: 2.49 MiB) to /root/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 697k/697k [00:00<00:00, 1.73MB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 3/3 [00:00<00:00,  6.41ba/s]\n",
      "100% 1/1 [00:00<00:00, 25.38ba/s]\n",
      "100% 3/3 [00:00<00:00,  3.93ba/s]\n",
      "2021/06/03 04:17:42\tINFO\t__main__\tStart training\n",
      "2021/06/03 04:17:42\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/03 04:17:42\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/03 04:17:42\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/03 04:17:42\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/03 04:17:42\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/03 04:17:42\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/03 04:17:46\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 0/78]  eta: 0:00:37  lr: 9.957264957264958e-05  sample/s: 8.517875275176072  loss: 0.0094 (0.0094)  time: 0.4762  data: 0.0066  max mem: 3806\n",
      "2021/06/03 04:18:12\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [50/78]  eta: 0:00:14  lr: 7.820512820512821e-05  sample/s: 7.524822354673497  loss: 0.0070 (0.0083)  time: 0.5219  data: 0.0051  max mem: 5113\n",
      "2021/06/03 04:18:26\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:40\n",
      "2021/06/03 04:18:27\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
      "2021/06/03 04:18:27\tINFO\t__main__\tValidation: accuracy = 0.6534296028880866\n",
      "2021/06/03 04:18:27\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/03 04:18:29\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 0/78]  eta: 0:00:40  lr: 6.623931623931624e-05  sample/s: 7.780182192205171  loss: 0.0069 (0.0069)  time: 0.5197  data: 0.0056  max mem: 5113\n",
      "2021/06/03 04:18:56\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [50/78]  eta: 0:00:15  lr: 4.4871794871794874e-05  sample/s: 7.716309134075412  loss: 0.0031 (0.0036)  time: 0.5278  data: 0.0050  max mem: 5113\n",
      "2021/06/03 04:19:10\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:41\n",
      "2021/06/03 04:19:11\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
      "2021/06/03 04:19:11\tINFO\t__main__\tValidation: accuracy = 0.6859205776173285\n",
      "2021/06/03 04:19:11\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/03 04:19:12\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 0/78]  eta: 0:00:38  lr: 3.290598290598291e-05  sample/s: 8.278140356019103  loss: 0.0032 (0.0032)  time: 0.4938  data: 0.0105  max mem: 5113\n",
      "2021/06/03 04:19:38\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [50/78]  eta: 0:00:14  lr: 1.153846153846154e-05  sample/s: 7.6382214832627815  loss: 0.0011 (0.0014)  time: 0.5235  data: 0.0051  max mem: 5113\n",
      "2021/06/03 04:19:52\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:40\n",
      "2021/06/03 04:19:53\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
      "2021/06/03 04:19:53\tINFO\t__main__\tValidation: accuracy = 0.6895306859205776\n",
      "2021/06/03 04:19:53\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "tokenizer config file saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/03 04:19:54\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/03 04:19:56\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
      "2021/06/03 04:19:56\tINFO\t__main__\tTest: accuracy = 0.740072202166065\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"rte\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/rte/kd/rte-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/03 04:19:58\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/03 04:19:59\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
      "2021/06/03 04:19:59\tINFO\t__main__\tTest: accuracy = 0.6895306859205776\n",
      "2021/06/03 04:19:59\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/03 04:19:59\tINFO\t__main__\trte/test: 3000 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task rte \\\n",
    "  --run_log log/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TUjMUFiy2LFP"
   },
   "source": [
    "### 4.9 WNLI task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8pHlvCY0qIVE",
    "outputId": "075f8b42-cce0-4ca9-aa59-3d55dc5feb45"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-06-03 04:20:10.071306: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0\n",
      "2021/06/03 04:20:12\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='wnli', test_only=False, world_size=1)\n",
      "2021/06/03 04:20:12\tINFO\t__main__\tDistributed environment: NO\n",
      "Num processes: 1\n",
      "Process index: 0\n",
      "Local process index: 0\n",
      "Device: cuda\n",
      "Use FP16 precision: True\n",
      "\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006998800 acquired on /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmptdathnjw\n",
      "Downloading: 100% 699/699 [00:00<00:00, 997kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6\n",
      "creating metadata file for /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006998800 released on /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6.lock\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"wnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6\n",
      "Model config BertConfig {\n",
      "  \"_name_or_path\": \"bert-large-uncased\",\n",
      "  \"architectures\": [\n",
      "    \"BertForSequenceClassification\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"wnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 1024,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 4096,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 16,\n",
      "  \"num_hidden_layers\": 24,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"problem_type\": \"single_label_classification\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006509968 acquired on /root/.cache/huggingface/transformers/cf61f5aaba9f282404ed7989140c023e41fc9e93e209bd3601291fbe15705724.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp8oaej6tn\n",
      "Downloading: 100% 232k/232k [00:00<00:00, 31.9MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/cf61f5aaba9f282404ed7989140c023e41fc9e93e209bd3601291fbe15705724.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "creating metadata file for /root/.cache/huggingface/transformers/cf61f5aaba9f282404ed7989140c023e41fc9e93e209bd3601291fbe15705724.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006509968 released on /root/.cache/huggingface/transformers/cf61f5aaba9f282404ed7989140c023e41fc9e93e209bd3601291fbe15705724.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006509968 acquired on /root/.cache/huggingface/transformers/9281288633f54e470c1ac806e5424da068442422c45be7c054d66eb15df6d6de.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpiczcafmg\n",
      "Downloading: 100% 466k/466k [00:00<00:00, 38.2MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/9281288633f54e470c1ac806e5424da068442422c45be7c054d66eb15df6d6de.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "creating metadata file for /root/.cache/huggingface/transformers/9281288633f54e470c1ac806e5424da068442422c45be7c054d66eb15df6d6de.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006509968 released on /root/.cache/huggingface/transformers/9281288633f54e470c1ac806e5424da068442422c45be7c054d66eb15df6d6de.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829.lock\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140065974615504 acquired on /root/.cache/huggingface/transformers/212133c79cdb140264335b8c1135eb8f45b4476fa9d9723f8f6139c50531a64a.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpgbeisnr8\n",
      "Downloading: 100% 112/112 [00:00<00:00, 181kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/212133c79cdb140264335b8c1135eb8f45b4476fa9d9723f8f6139c50531a64a.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "creating metadata file for /root/.cache/huggingface/transformers/212133c79cdb140264335b8c1135eb8f45b4476fa9d9723f8f6139c50531a64a.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140065974615504 released on /root/.cache/huggingface/transformers/212133c79cdb140264335b8c1135eb8f45b4476fa9d9723f8f6139c50531a64a.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140065974671888 acquired on /root/.cache/huggingface/transformers/2c8b814447fc39c99d7846c6fed5f97004f79177f6e25cc7c6d62e040a20ae70.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpth0omp00\n",
      "Downloading: 100% 304/304 [00:00<00:00, 517kB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/2c8b814447fc39c99d7846c6fed5f97004f79177f6e25cc7c6d62e040a20ae70.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "creating metadata file for /root/.cache/huggingface/transformers/2c8b814447fc39c99d7846c6fed5f97004f79177f6e25cc7c6d62e040a20ae70.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140065974671888 released on /root/.cache/huggingface/transformers/2c8b814447fc39c99d7846c6fed5f97004f79177f6e25cc7c6d62e040a20ae70.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/cf61f5aaba9f282404ed7989140c023e41fc9e93e209bd3601291fbe15705724.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/9281288633f54e470c1ac806e5424da068442422c45be7c054d66eb15df6d6de.f471bd2d72c48b932f7be40446896b7e97c3be406ee93abfb500399bc606c829\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/212133c79cdb140264335b8c1135eb8f45b4476fa9d9723f8f6139c50531a64a.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d\n",
      "loading file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/2c8b814447fc39c99d7846c6fed5f97004f79177f6e25cc7c6d62e040a20ae70.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426\n",
      "2021/06/03 04:20:12\tINFO\tfilelock\tLock 140066006998800 acquired on /root/.cache/huggingface/transformers/39d9c9f2241519997bb6b35f40ef584230e460a89c2563b42eccd3f51a7afd33.28e1681df5b030942efab0c0d5f2fc4b1241ac1b868c743b09c9877ca97af4cb.lock\n",
      "https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmptke6d4em\n",
      "Downloading: 100% 1.34G/1.34G [00:22<00:00, 59.7MB/s]\n",
      "storing https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/39d9c9f2241519997bb6b35f40ef584230e460a89c2563b42eccd3f51a7afd33.28e1681df5b030942efab0c0d5f2fc4b1241ac1b868c743b09c9877ca97af4cb\n",
      "creating metadata file for /root/.cache/huggingface/transformers/39d9c9f2241519997bb6b35f40ef584230e460a89c2563b42eccd3f51a7afd33.28e1681df5b030942efab0c0d5f2fc4b1241ac1b868c743b09c9877ca97af4cb\n",
      "2021/06/03 04:20:35\tINFO\tfilelock\tLock 140066006998800 released on /root/.cache/huggingface/transformers/39d9c9f2241519997bb6b35f40ef584230e460a89c2563b42eccd3f51a7afd33.28e1681df5b030942efab0c0d5f2fc4b1241ac1b868c743b09c9877ca97af4cb.lock\n",
      "loading weights file https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/39d9c9f2241519997bb6b35f40ef584230e460a89c2563b42eccd3f51a7afd33.28e1681df5b030942efab0c0d5f2fc4b1241ac1b868c743b09c9877ca97af4cb\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at yoshitomo-matsubara/bert-large-uncased-wnli.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"wnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
      "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias']\n",
      "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Downloading and preparing dataset glue/wnli (download: 28.32 KiB, generated: 154.03 KiB, post-processed: Unknown size, total: 182.35 KiB) to /root/.cache/huggingface/datasets/glue/wnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
      "Downloading: 100% 29.0k/29.0k [00:00<00:00, 375kB/s]\n",
      "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/wnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
      "100% 1/1 [00:00<00:00, 11.27ba/s]\n",
      "100% 1/1 [00:00<00:00, 146.90ba/s]\n",
      "100% 1/1 [00:00<00:00, 53.73ba/s]\n",
      "2021/06/03 04:20:42\tINFO\t__main__\tStart training\n",
      "2021/06/03 04:20:42\tINFO\ttorchdistill.models.util\t[teacher model]\n",
      "2021/06/03 04:20:42\tINFO\ttorchdistill.models.util\tUsing the original teacher model\n",
      "2021/06/03 04:20:42\tINFO\ttorchdistill.models.util\t[student model]\n",
      "2021/06/03 04:20:42\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
      "2021/06/03 04:20:42\tINFO\ttorchdistill.core.distillation\tLoss = 1.0 * OrgLoss\n",
      "2021/06/03 04:20:42\tINFO\ttorchdistill.core.distillation\tFreezing the whole teacher model\n",
      "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
      "2021/06/03 04:20:46\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 0/20]  eta: 0:00:08  lr: 9.900000000000001e-05  sample/s: 9.947978499788022  loss: 0.0099 (0.0099)  time: 0.4077  data: 0.0056  max mem: 3432\n",
      "2021/06/03 04:20:53\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:06\n",
      "2021/06/03 04:20:53\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:20:53\tINFO\t__main__\tValidation: accuracy = 0.5633802816901409\n",
      "2021/06/03 04:20:53\tINFO\t__main__\tUpdating ckpt at ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased\n",
      "Configuration saved in ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased/config.json\n",
      "Model weights saved in ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "2021/06/03 04:20:54\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 0/20]  eta: 0:00:06  lr: 7.900000000000001e-05  sample/s: 12.883242938573286  loss: -0.0000 (-0.0000)  time: 0.3171  data: 0.0066  max mem: 4520\n",
      "2021/06/03 04:21:01\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:07\n",
      "2021/06/03 04:21:01\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:21:01\tINFO\t__main__\tValidation: accuracy = 0.5633802816901409\n",
      "2021/06/03 04:21:02\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 0/20]  eta: 0:00:07  lr: 5.9e-05  sample/s: 10.825127755123109  loss: -0.0000 (-0.0000)  time: 0.3737  data: 0.0042  max mem: 4713\n",
      "2021/06/03 04:21:08\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:06\n",
      "2021/06/03 04:21:08\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:21:08\tINFO\t__main__\tValidation: accuracy = 0.5633802816901409\n",
      "2021/06/03 04:21:09\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [ 0/20]  eta: 0:00:06  lr: 3.9000000000000006e-05  sample/s: 12.431278374210969  loss: -0.0001 (-0.0001)  time: 0.3260  data: 0.0042  max mem: 4713\n",
      "2021/06/03 04:21:16\tINFO\ttorchdistill.misc.log\tEpoch: [3] Total time: 0:00:07\n",
      "2021/06/03 04:21:16\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:21:16\tINFO\t__main__\tValidation: accuracy = 0.5633802816901409\n",
      "2021/06/03 04:21:16\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [ 0/20]  eta: 0:00:07  lr: 1.9e-05  sample/s: 10.572980480854246  loss: -0.0001 (-0.0001)  time: 0.3824  data: 0.0041  max mem: 4713\n",
      "2021/06/03 04:21:23\tINFO\ttorchdistill.misc.log\tEpoch: [4] Total time: 0:00:07\n",
      "2021/06/03 04:21:23\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:21:23\tINFO\t__main__\tValidation: accuracy = 0.5633802816901409\n",
      "tokenizer config file saved in ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased/tokenizer_config.json\n",
      "Special tokens file saved in ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased/special_tokens_map.json\n",
      "2021/06/03 04:21:23\tINFO\t__main__\t[Teacher: bert-large-uncased]\n",
      "2021/06/03 04:21:24\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:21:24\tINFO\t__main__\tTest: accuracy = 0.5633802816901409\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"finetuning_task\": \"wnli\",\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
      "Model config BertConfig {\n",
      "  \"architectures\": [\n",
      "    \"BertForMaskedLM\"\n",
      "  ],\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"gradient_checkpointing\": false,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"layer_norm_eps\": 1e-12,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"model_type\": \"bert\",\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"position_embedding_type\": \"absolute\",\n",
      "  \"transformers_version\": \"4.6.1\",\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
      "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
      "loading weights file ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased/pytorch_model.bin\n",
      "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
      "\n",
      "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/wnli/kd/wnli-bert-base-uncased_from_bert-large-uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
      "2021/06/03 04:21:25\tINFO\t__main__\t[Student: bert-base-uncased]\n",
      "2021/06/03 04:21:25\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
      "2021/06/03 04:21:25\tINFO\t__main__\tTest: accuracy = 0.5633802816901409\n",
      "2021/06/03 04:21:25\tINFO\t__main__\tStart prediction for private dataset(s)\n",
      "2021/06/03 04:21:25\tINFO\t__main__\twnli/test: 146 samples\n"
     ]
    }
   ],
   "source": [
    "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
    "  --config torchdistill/configs/sample/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.yaml \\\n",
    "  --task wnli \\\n",
    "  --run_log log/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.txt \\\n",
    "  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "K18Cynocl-GN"
   },
   "source": [
    "# 5. Validate your prediction files for GLUE leaderboard\n",
    "To make sure your prediction files contain the right numbers of samples (lines), you should see the following output by `wc -l <your prediction dir path>`.\n",
    "\n",
    "```\n",
    "   1105 AX.tsv\n",
    "   1064 CoLA.tsv\n",
    "   9848 MNLI-mm.tsv\n",
    "   9797 MNLI-m.tsv\n",
    "   1726 MRPC.tsv\n",
    "   5464 QNLI.tsv\n",
    " 390966 QQP.tsv\n",
    "   3001 RTE.tsv\n",
    "   1822 SST-2.tsv\n",
    "   1380 STS-B.tsv\n",
    "    147 WNLI.tsv\n",
    " 426320 total\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "0gynS2fvnVl4",
    "outputId": "52fa264e-7c03-4429-9c5c-fa13ee216f53"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   1105 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/AX.tsv\n",
      "   1064 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/CoLA.tsv\n",
      "   9848 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-mm.tsv\n",
      "   9797 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-m.tsv\n",
      "   1726 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MRPC.tsv\n",
      "   5464 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QNLI.tsv\n",
      " 390966 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QQP.tsv\n",
      "   3001 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/RTE.tsv\n",
      "   1822 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/SST-2.tsv\n",
      "   1380 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/STS-B.tsv\n",
      "    147 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/WNLI.tsv\n",
      " 426320 total\n"
     ]
    }
   ],
   "source": [
    "!wc -l leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cxWY9Ts-XEX9"
   },
   "source": [
    "## 6. Zip the submission files and download to make a submission"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "FQSr6ibGWV72",
    "outputId": "f6e8ede4-687d-4ac9-e8af-ec1dbdc1b723"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/AX.tsv (deflated 82%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/CoLA.tsv (deflated 64%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-mm.tsv (deflated 83%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-m.tsv (deflated 83%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MRPC.tsv (deflated 64%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QNLI.tsv (deflated 84%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QQP.tsv (deflated 73%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/RTE.tsv (deflated 84%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/SST-2.tsv (deflated 64%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/STS-B.tsv (deflated 56%)\n",
      "  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/WNLI.tsv (deflated 62%)\n"
     ]
    }
   ],
   "source": [
    "!zip bert_base_uncased_from_bert_large_uncased-submission.zip leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cc66ysgrWv12"
   },
   "source": [
    "Download the zip file from \"Files\" menu.  \n",
    "To submit the file to the GLUE system, refer to their webpage.\n",
    "https://gluebenchmark.com/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AxE8LY2E3Z78"
   },
   "source": [
    "## 7. More sample configurations, models, datasets...\n",
    "You can find more [sample configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/legacy/sample/) in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.\n",
    "If you would like to use larger datasets e.g., **ImageNet** and **COCO** datasets and models in `torchvision` (or your own modules), refer to the [official configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/legacy/official) used in some published papers.\n",
    "Experiments with such large datasets and models will require you to use your own machine due to limited disk space and session time (12 hours for free version and 24 hours for Colab Pro) on Google Colab.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0BEXt2243OE9"
   },
   "source": [
    "# Colab examples for training student models without teacher models\n",
    "You can find Colab examples for training student models without teacher models in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "glue_kd_and_submission.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
